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BACKGROUNn OF TRF TMTrr>^r^T-rr 
Field the Tn.zon<-.-^^ 

The present invention relates to DNA sequences fo. t-H 
hu.an and ra.Mt en.^es which control the co^Tver: on I. 
acetvl to hybrid and complex N-glycans, aoP-K- 

acetylgiucosamine:a-3-D-inannoside 0-1,2-11- 
acetylgiucosaminyl transferase I fGnT r.i =. 

such DNA sequences tran.ro . ' ^'^^^^^^ containing 

guences, transformed cells containing such 

glycoproteins to branched N-glycan glycoproteins. 
Discn^^-i^., 1-hr nackarnnnH 

The biosynthesis of highly branched N- anr. n . 

virus =r h„ » transfonnad either by polyoma 

irus or by Rous sarcoma virus show a two-fold increase in 
one H-acetyl,lucosaminyltransferases ^O^Zl 

transferase V) involved In the synthesis of hl»h, v, 
complex N-glycans (Pierce et al Z.sTl """^"^^ 
«1. 1077.-10777.. yamashita et a LLff'f^f^' 

vo. .so. 3.3-3,.,. .U ^-^lyLiill^iff^^ 
structure «-.l-6 („an.l-3,Man;51-4GlcNAc^l-4<;icHAo^-Asn ' 

hTc^rte" - - 

least r ^''t-nnae are initiated by the action of at 

least f.ve Golgi-iocalized membrane-bound 
^Scha^'t""'"""' designated OnT I, ii, xv, v and VI 
(Schachter et al (i,s9, Hsaiofls.£aa«ol., vol. 17, I5, 3,,, 

- umiTT - "^ala" 

"bisected" by a ""'^^-^ 

GlcNAc-transferase III (cnT III) . 

glycanrir^T"" °' W^h-mannose to complex and hybrid „- 
«yoans is controlled by UDP-olcNAc:.-3.o-mannoside U ll 
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acetylglucosaminyltransferase I (GnT I, EC 2,4.1.101), which 
catalyzes the reaction: 

UDP-GlcWAc -h (Manal-6[ManQ:l-3]Manal-6) (Manal-S ) Man)91-4R 
(Manal-6[Manal-3]ManQ:l-6) (GicNAc;31-2Manal-3 ) Man/?1-4R -h UDP, 

where R is GlcNAc)31-4 (V-Fucal-6) GlcNAc-Asn-X, and Asn-X may 
be an Asn residue which is part of the amino acid sequence 
of a protein. 

The enzyme is specific for the Manal-3Man^l-4GlcNAc-arm 
of the core. The presence of a )32-linked GlcNAc residue at 
the non-reducing terminus of this arm is essential for 
subsequent action of several enzymes in the processing 
pathway (Schachter et al (1983) Can. J. Biochem. Cell Biol, , 
vol, 61, 1049-1066; Schachter et al (1985) 
"Glycosyltransf erases involved in the biosynthesis of 
protein-bound oligosaccharides of the 
asparagine-N-acetyl-D-glucosamine and 

serine (threonine) -N-acetyl-D-galactosamine types", in: A.N. 
Martonosi,. ed. The Enzymes of Biological Membranes , New 
York, N.Y., Plenum Press, 227-277; Schachter, (1986) 
Biochem. Cell Biol. , vol. 64, 163-181; Schachter (1988) 
Biochemie. , vol, 70(11), 1701-1702), i.e., GnT II, III and 
IV require the prior action of GnT I^ and GnT V and VI 
require the prior action of GnT II. GnT I has been reported 
in hen oviduct, Chinese hamster ovary cells, bciby hamster 
kidney cells, bovine colostrum, pig trachea and mammalian 
liver (Schachter et al (1983) Can. J. Biochem. Cell Biol. , 
vol. 61, 1049-1066; Schachter et al (1985) 
"Glycosyl transferases involved in the biosynthesis of 
protein-bound oligosaccharides of the 

asparagine-N-acetyl-D-glucosamine and serine (threonine) -N- 
acetyl-D-galactosamine types'"^ in: A.N. Martonosi, ed. The 
Enzymes of Biological Membranes , New York, N.Y., Plenum 
Press, 227-277; Schachter et al (1980) "Mammalian 
glycosyltransf erases: their role in the synthesis and 
function of complex carbohydrates and glycolipids" , in: 
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Lennarz W.J. , ed. Biochemistry of Glycoproteins and 
Proteoglycans , New York, N.Y,, Plenum Press, 85-160; 
Brockhausen et al (1988) Biochem, Cell Biol . > vol. 66, 1134- 
1151) . The enzyme has been partially purified from bovine 
colostrum (Harpaz et al (1980) J. Biol . Chem. , vol. 255, 
4885-4893) and from pig liver and trachea (Oppenheimer et al 
(1981) J. Biol. Chem, , vol. 256, 11477-11482), and to 
homogeneity from rabbit liver (Oppenheimer et al (1981) 
J. Biol. Chem. , vol. 256, 799-804; Nishikawa et al (1988) 
J. Biol. Chem. . vol, 263, 8270-8281). 

Recently, the cloning of DNA encoding proteins and the 
expression of such cloned DNA to produce the proteins has 
become commercially important. For ease of culturing, it is 
preferred that the cloned DNA be expressed in a primitive 
host, such as a bacteria (e.g., E. coli ) . a yeast, or a 
fungus. However, such primitive hosts may not normally 
possess the enzymes required for the post-translation 
modification of proteins which occurs in the cells from 
which the DNA originated. Thus, although many primitive 
hosts possess the necessary enzymes to effect the 
post-translation modification of a protein to a high mannose 
derivative, such host do not contain the enzyme required to 
convert the high mannose derivative to a hybrid and branched 
glycan, GnT I. 

As discussed in Bergh et al, "Glycosylation of 
Heterologously Expressed Proteins: Problems and Solutions", 
in Therapeutic Peptide and Proteins; Assessing the New 
Technologies , Marshak et al eds, Cold Spring Harbor 
Laboratory, Banbury Report 29, 1988, in prokaryotes, the 
resulting lack of glycosylation may have a variety of 
consequences, such as incorrect polypeptide chain-folding, 
precipitation and aggregation of the protein, proteolytic 
degradation or enhanced immunogenicity . 

Yeast and vertebrate cells use the same GlCjMangGlcNAc-, 
lipid-linked precursor for cotranslational glycosylation of 
asparagine residues, both recognize the same Asn-X-ser/Thr 
secfuences, and both remove the three glucose residues soon 
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after transfer. Thus, a mammalian glycoprotein expressed in 
yeast may contain the same carbohydrate chains as the native 
protein until after it leaves the endoplasmic reticulum. 
After entry into the Golgi, however, the later steps in 
oligosaccharide processing are very different in yeast (see 
Kukuruzinska et al, Ann. Rev. Biochem. , vol. 56, p. 915, 
1987) and vertebrates, (see Hubbard and Ivatt Ann. Rev. 
Biochem. , vol. 50, p. 555, 1981; Kornfeld and Kornfeld 
Ann. Rev. Biochem. , vol. 54, p. 631, 1985). Processed 
saccharomvces cerevisiae N-1 inked oligosaccharides contain 
two GlcNAc residues and from 9 to 50 or more mannose 
residues. On the other hand, mammalian oligosaccharides 
never have more than nine mannose residues and most commonly 
contain GlcNAc, galactose, and sialic acid attached to a 
ManjGlcNAc^ core. 

Thus, heterologous expression in yeast of a mammalian 
glycoprotein intended for therapeutic use can present a 
number of potential glycosylation-related problems. For 
example, carbohydrate chains may be highly antigenic; in 
addition, they are recognized by Man/GlcNAc-specif ic 
receptors on cells of the mammalian reticuloendothelial 
system, resulting in rapid clearance of the glycoprotein 
fr*m the circulation. 

Thus, it is desirable to: (1) provide large amounts of 
GnT I for the further post translational modification of 
recombinantly produced proteins; and (2) provide a means for 
enabling primitive hosts to express GnT I. 

However, as yet there are no methods available for 
obtaining large quantities of GnT I or enabling primitive 
hosts to express GnT I. 

gTTMMARV OF TWF. TNVENTION 

Accordingly, it is an object of the present invention 
to provide a method for producing large quantities of GnT I. 

It is another object to provide a method for converting 
high mannose derivatives to hybrid and complex N-glycans. 
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It is another object to provide isolated DNA sequences 
which encode GnT I, 

It is another object to provide plasmids which contain 
a DNA sequence which encodes GnT I. 

It is another object to provide microorganisms which 
contain a heterologous sequence of DNA which encodes GnT I. 

These and other objects, which will become apparent 
during the following detailed description, have been 
achieved by the inventors' isolation and cloning of DNA 
sequences encoding rabbit and human GnT I, preparation of 
plasmids containing such DNA sequences and transfection of 
microorganisms, with such plasmids. 



A more complete appreciation of the invention and many 
of the attendant advantages thereof will be readily obtained 
as the same become better understood by reference to the 
following detailed description when considered in connection 
with the accompanying drawings, wherein: 

Figure 1 illustrates the amino acid sequence data for 
the eight peptides isolated from rabbit liver GnT I and 
nucleotide sequences of the six synthetic oligonucleotides 
prepared on the basis of the peptide sequences- The single 
letter code is used for amino acid sequence data; upper case 
letters indicate firm assignments and lower case letters 
indicate tentative assignments. The underlined sections of 
the peptide sequences indicate the regions used for the 
design of oligonucleotide probes. Probes 2, 3 and 6 were 
based on peptides 2, 3 and 6, respectively; S indicates 
"sense" and A indicates "antisense" directions; 

Figure 2 illustrates a schematic representation of 
GnT I clones. PGR product, product obtained by PCR 
amplification of rabbit liver cDNA; rc 1600, 1.6 kb GnT I 
cDNA clone; rc2500, 3 . 0 kb GnT I cDNA clone. The shaded 
boxes represent the coding region. During subcloning, the 
3.0 kb CDNA was reduced to 2,5 kb by a 0.5 kb deletion at 
the 5 " -end ; 
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Figure 3 illustrates the results of an agarose gel 
electrophoresis (1% agarose) of the products of the 
polymerase chain reaction (PGR) using rabbit liver cDNA as 
template and the following combinations of oligonucleotides 
as primers: 2S-3A; 2S-6A,- 3S-2A; 3S-6A; 6S-2A? 6S-3A 
(Figure 1) . conditions of PGR are given in the Methods 
section. The gel was stained with ethidium bromide 
(0.5 ng/ml) . Primer-dependent products were obtained with 
combinations 2S-6A (0.50 kb) and 3S-6A (0.45 kb) . The arrow 
designates the 0.5 kb DNA marker; the remaining standards 
are at 1.0 kb, 1.6 kb, 2.0 kb and at 1.0 kb intervals 
thereafter; 

Figure 4 illustrates the nucleotide sequence (lower 
case) of the 2.5 kb GnT I cDNA clone. The amino acid 
sequence in the coding region is shown in upper case 
letters. The positions of the eight peptide sequences 
obtained from proteolytic digests of GnT I (Figure 1) are 
underlined with a single solid line; the regions of these 
peptide sequences used for oligonucleotide probe synthesis 
(Figure 1) are additionally underlined with a discontinuous 
line. The putative transmembrane segment (bases 62-13 6) is 
underlined with a double line. The consensus 
polyadenylation signal AATAAA at position 2435 is 
underlined. Only the nucleotide sequence is numbered; 

Figure 5 illustrates an autbradiogram of an SDS- 
polyacrylamide gel electrophoresis experiment showing in 
vitro transcription and translation of the rabbit cDNA. 
mRNA was generated from the 2.5 kb GnT I cDNA and was used 
as the template for in vitro translation using rabbit 
reticulocyte lysate and L- ["S] -methionine (see Methods for 
details). Lane C, no plasmid in the incubation; lane 12, 
PGEM-7Z containing the 2.5 kb GnT I cDNA with an insert 
between bases 56 and 57 which interrupts the reading frame; 
lane 16, pGEM-7z containing the 2.5 kb GnT I cDNA 
(pGEM-7z-rcgntl) ; 

Figure 6 illustrates the nucleotide sequence for human 

genomic DNA encoding for GnT I; 
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Figure 7 illustrates the amino acid sequence for human 
GnT I ; and 

Figure 8 illustrates both the nucleotide sequence for 
human genomic DKA encoding for GnT I and the amino acid 
sequence of human GnT I. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Thus, one aspect of the present invention relates to 
isolated DNA sequences which encode rabbit GnT I . 
Specifically, such DNA sequences encode a protein having the 
sequence (starting from the N-terminal) of formula I shown 



below: 
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ARG THR ASN ASP ARG LYS GLU LEU GLY GLU VAL ARG VAL GLN TYR 
THR GLY ARG ASP SER PHE LYS ALA PHE ALA LYS ALA LEU GLY VAL 
MET ASP ASP LEU LYS SER GLY VAL PRO ARG ALA GLY TYR ARG GLY 
ILE VAL THR PHE LEU PHE ARG GLY ARG ARG VAL HIS LEU ALA PRO 
PRO GLN THR TRP ASP GLY TYR ASP PRO SER TRP THR 

In another aspect, the present invention relates to DNA 



sequences which encode human GnT I. Such DNA sequences 
encode a protein having the sequence (starting from the 



N-terminus ) 


of 


formula II shown 


below: 
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T FIT 




ARG 


GLU 


VAL 


ILE 


ARG 


LEU 


ALA 


GLN 


ASP 


^ 1 • 
O J. . 


AT A 


m TT 


VAL 


nj TT 


T TTTT 




ARG 


ARG 




GLY 


LEU 


LEU 


GLN 


GLN 


ILE 


/ o . 


CJ V 
\ai^ L 




ALA 


T PTT 






GLN 


ARG 


GLY 


ARG 


VAL 


PRO 


THR 


ALA 


ALA 




PRfi 


PPO 


ALA 


vr JUL^ 


PPO 


ARC 


VAL 


PRO 


VAL 


THR 


PRO 


ALA 


PRO 


ALA 


VAL 


106 : 


TT F 


PPO 


ILE 


LEU 


VAL 


ILE 


ALA 


CYS 


ASP 


ARG 


SER 


THR 


TTA T 
VAii 


ARG 


ARG 


121: 


CYS 


LEU 


ASP 


LYS 


LEU 


LEU 


HIS 


TYR 


ARG 


PRO 


SER 


ATA 
AIA 


r»T TT 


T CTT 


"DUF 

rilc. 


136 : 


PRO 


ILE 


ILE 


VAL 


SER 


GLN 


ASP 


CYS 


GLY 


HIS 








ATA 


r*T M 


151: 


ALA 


ILE 


ALA 


SER 


TYR 


GLY 


SER ALA 


VAL 


THR 


TJTC 


TT F 


APP 


PT N 


PPO 


166 : 


ASP 


LEU 


SER 


SER 


ILE 


ALA 


VAL 


PRO 


PRO 


ASP 


UTC 

nxb 


Apr* 


T VC 


PUF 




181 ; 


GLY 


TYR 


TYR 


LYS 


ILE 


ALA 


ARG 


HIS 


TYR 


ARG 


TRP 


ATA 


T FTT 


r'T V 


PT M 


196 : 


VAL 


PHE 


ARG 


GLN 


PHE 


ARG 


PHE 


PRO 


AIA 


ALA 


VAL 


TTA T 

VAL 


TT A T 

VAL 


OT TT 

GLU 




211: 


ASP 


LEU 


GLU 


VAL 


ALA 


PRO 


ASP 


PHE 


PHE 


GLU 


TYR 


PHE 


ARG 


ALA 


THR 


226: 


TYR 


PRO 


LEU 


LEU 


LYS 


ALA 


ASP 


PRO 


SER 


LEU 


TRP 


CYS 


VAL 


SER 


ALA 


241: 


TRP 


ASN 


ASP 


ASN 


GLY 


LYS 


GLU 


GLN 


MET 


VAL 


ASP 


ALA 


SER 


ARG 


PRO 


256: 


GLU 


LEU 


LEU 


TYR 


ARG 


THR 


ASP 


PHE 


PHE 


PRO 


GLY 


LEU 


GLY 


TRP 


LEU 


271: 


LEU 


LEU 


ALA 


GLU 


LEU 


TRP 


ALA 


GLU 


LEU 


GLU 


PRO 


LYS 


TRP 


PRO 


LYS 


286: 


AIA 


PHE 


TRP 


ASP 


ASP 


TRP 


MET 


ARG 


ARG 


PRO 


GLU 


GLN 


ARG 


GLN 


GLY 


301: 


ARG 


ALA 


CYS 


ILE 


ARG 


PRO 


GLU 


ILE 


SER 


ARG 


THR 


MET 


THR 


PHE 


GLY 


316: 


ARG 


LYS 


GLY 


VAL 


THR 


HIS 


GLY 


GLN 


PHE 


PHE 


ASP 


GLN 


HIS 


LEU 


LYS 


331: 


PHE 


ILE 


LYS 


LEU 


ASN 


GLN 


GLN 


PHE 


VAL 


HIS 


PHE 


THR 


GLN 


LEU 


ASP 


346: 


LEU 


SER 


TYR 


LEU 


GLN 


ARG 


GLU 


ALA 


TYR 


ASP 


ARG 


AS? 


PHE 


LEU 


ALA 


361: 


ARG 


VAL 


TYR 


GLY 


ALA 


PRO 


GLN 


LEU 


GLN 


VAL 


GLU 


LYS 


VAL 


ARG 


THR 


376: 


ASN 


ASP 


ARG 


LYS 


GLU 


LEU 


GLY 


GLU 


VAL 


ARG 


VAL 


GLN 


TYR 


THR 


GLY 


391: 


ARG 


ASP 


SER 


PHE 


LYS 


ALA 


PHE 


ALA 


LYS 


ALA 


LEU 


GLY 


VAL 


MET 


ASP 


406: 


ASP 


LEU 


LYS 


SER 


GLY 


VAL 


PRO 


ARG 


ALA 


GLY 


TYR 


ARG 


GLY 


ILE 


VAL 
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«6, THE TRF OLU OLY TVR ASP PRO SEE T!tP ASN 

Exemplary of the CNA se^en=es encoding -^^^^^^^ ^ = 
the s.,uenc. (Starring £ro. the ^-term.nus) of formula III, 

shown below: 

>ta ctg aag aag ca, tot get ggg ctt gtg ctg tgg ggt get ate 
- gee tg, .t e et, etg ete ete tte tte tgg aea 

nat cca at:q cct age agg ctg ccg ^^cx ^ 

Tac Tot gee age ete aee egt gag gtg ate =g= tta get eag gat 
Tee gag gta gag tt, gaa egt ea, egg gga =tg ttg eag eag att 

!g, gag eae eat get ett tgg age eag egg tgg aag gtg eet aet 
agg gag eae y ^.^a 

qea gee eet cet get eag eeg eat gtg eet gtg 
get gtg ate eee ate etg gta att gee tgt gae cgc age aee gte 
It 1,1 tgt ttg gae aag =ta etg cat tat egg eet tea get gag 
7, tte eee ate att gte age eag gae tgt ggg eat gag gag aea 

oe! eag gte att get tee tat gge age gea gte aea eae ate egg 
gee eag gte att g 

caa ect gae etg age aac att get gtg y 

^ _ o=» oaa cat tac cge tgg gca ttg 

tte eag gge tae tac aag ate gea egg ■> 
Z ea! n= tte eae aat tte aac tac cca gca get gtg gtg gtg 

r^rr aea cca cjac tte ttt gag tac tte cag 
gag gat gat etc gag gtg gea cca g 

acc act tac cca ctg ttg aaa gca ga«- ^ ,„4. 

Zl gee tgg aat gae aat gge aaa gaa eag atg gta gae teg agt 
Tag eea gfg tta ete tac cge aea gat tte ttt ect gge tta gge 
tgg tta J, ttg get gaa ete tgg get gaa ctg gag eee aag tgg 
eee aaa gee tte tgg gat gae tgg atg cgc egg ect gag eag ega 
aag ggg agg gee tgt gtg egt cea gaa ate tea aga aea atg aca 

ttl gg! eg! !ag ggt gtg age cat ggg eag tte ttt gae cag cat 
rtr gge egg a.cx^ ^ ^ 

=rc aag tte ate aag ctg aac =ag eag ttt gta eee tte 
etg gae etg teg tae ett eag cag gag gee tat gae egg gat tte 
et! get eg! gtt tat ggt get eee eag tta eag gtg gag aaa gtg 
.gg !ee aat gae egg aag gag eta gga gag gtg ege gta eag tae 
ae! gge agg gae age tte aag get tte gee aag gee etg ggt gte 
atg gat gae ete aaa tea ggt gta ecc agg get gga tae egg 9g 
at! !te !ce tte tta tte eg, gge cgc egt gte eae ctg gcg eee 
eet eag aet tgg gat gge tat gat eet agt tgg act 

The DMA sequence of formula III corresponds to the 
coding region of rabbit cDNA encoding GnT I. Another 
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example oi a DKA se<iuence encoding rabbit GnT I is a larger 
:L/ion of CD«A encoaing rabbit O.T X, which has the formula 

IV as shown below: 

..rr^n^r^c^a tttqcctgcc ctcccctgtg ggggccagg 
1 aaattccggc aagrcaracc -cT^uy^-^v-v^ - ^x.^ 

atg ctg aag aag ca, tct get ggg ctt gtg ctg tgg ggt get ate 
ct! ttl gtg gc= tgg aat gc= ctg ctg etc etc ttc ttc tgg aca 
cgt cca gtg cct age agg ctg ecg tea ga= aat get etc gat gat 
,!c cet gee age etc a== =,t gag gtg ate =g= tta get cag gat 
gee gag gta gag ttg gaa egt cag egg gga ctg ttg cag cag att 
!gg gag cae cat get ctt tgg age cag egg tgg aag gtg cct act 
g!! g=! cet ect get cag ecg cat gtg cct gtg ace cca ecg eca 
get gtg ate cce ate ctg gta att gee tgt gac cge age ace gte 
!gc Ige tgt ttg gae aag eta ctg eat tat egg ect tea get gag 
etg ttc cce ate att gte age cag gac tgt ggg cat gag gag aca 
gee cag gte att get tee tat gge age gea gte aca cae ate egg 
eaa cet gae etg age aae att get gtg cag cce gae cae ege aag 
ttc cag gge tae tae aag ate gea egg eat tac cgc tgg gea ttg 
,gc cal L ttc cac aat ttc aac tac eca gea get gtg gtg gtg 
g!a gat gat etc gag gtg gea cca gac ttc ttt gag tae ttc cag 
gee let tac cea ctg ttg aaa gea gac cec tec etc tgg tgt gtg 
Lt gee tgg aat gae aat gge aaa gaa eag atg gta gae teg agt 
aag La gag tta etc tae ege aea gat ttc ttt cet gge tta gge 
tgg tta ctg ttg get gaa etc tgg get gaa etg gag cce aag tgg 
cec aaa gee tte tgg gat gae tgg atg cgc egg cet gag cag cga 
aag ggg agg gee tgt gtg egt eca gaa ate tea aga aca atg aea 
ttl gge egg aag ggt gtg age cat ggg cag ttc ttt gae eag eat 
etc !Ig ttc ate aag ctg aac cag cag ttt gta cce tte ace eag 
etg gae etg teg tac ctt eag cag gag gee tat gac egg gat tte 
ett get egt gtt tat ggt get cec eag tta eag gtg gag aaa gtg 
agg ace aat gae egg aag gag eta gga gag gtg ege gta cag tae 
aea gge agg gae age ttc aag get ttc gee aag gee etg ggt gte 
atg gat gac etc aaa tea ggt gta cce agg get gga tac egg gge 
att gte acc ttc tta ttc egg gg= ege egt gte eac ctg geg cce 
ect eag act tgg gat gge tat gat cet agt tgg act 
ca.caect:c= tgcccgtcco ttccgggccc cttccccsca atttcatgat ""S"|8J 

bb: ^ ^ ^ iSS 

r4:sr. :c„..i.i 
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aggtcacccg cagaggagct 
ccgagagcca gggcatgaga 
tctgggctac aggagaagtg 
attgtcagag tctaggtgca 
cctcattcct gactnctgtc 
gatgaaaaat gaagaggaaa 
aaaaaaactt cattgtacca 
agtcccctcc ctgttgcttc 
ccctgagttt tatacacagg 
taagttacag tggccaagac 
tccaaaacat tcccatgtcc 
cgtgtgtgtg tgrgtgtgtg 
ggaagctaca gaattatttzt 
aaaaaaccgg aaccc 



ggcaacttta gccttcntaa 
ccccttgttc atgcrccttt 
aacatattgt ggccagaata 
gttattgggt tgtcagagtt 
agctcttctt tctttgcagc 
agaaatattc gcacccagct 
cttcaaagag acactcttga 
aggagaatgc tgcgctggtc 
ctcctcccta aggctgtggc 
caggacaact ccggccatga 
tcacaggcta ggatgcagat 
tgtgttttct tgcctgacct 
caaaaataaa ggctgaattg 



ccaggccntc tctttctgac 
ctaccttccc ctaataaggg 
atactaacca gaggggcctc 
aatgccttct gttcttcttt 
ctagcaattt ttggttctaa 
attgggagaa aggtagtggg 
cctcttcctt tctaaaaatc 
agttctgtgt gatccttctt 
ttctggtggc cctcctgaca 
gctaagtcct gcctaccttc 
gttggttgga gaggaatttg 
cagtttcatg gatgaaaagt 
tccgaaaaaa aaaaaaaaaa 



Ths DMA sequence of fomulae III and IV have been 
Obtained by cXoning the rabbit cDHA encoding GnT I by the 
procedure which is described in detail in the Examples 

section- ^ m x c= 

Exemplary of the DNA sequences encoding huiaan GnT I is 
the sequence (starting at the 5 --terminus) of formula V, 
shown below: 

r/£S iriS SS iHEi 

L141 cgagccagcg ggggagggcg cccaccgcgg ^ | tgaccgcagc actgttcggc 

1201 cccccgcgcc ggcggtgatt ^"""^c ccrcgl"gf gLcrLccc arcaccgrra 
1261 gct:gccrgga caagctgccg "^atcggc | ctcctacggc agcgcggtca 

1321 gccaggactg cgggcacgag gagacggccc ^Sg« g cgcaagttcc 
1381 cgcacatccg gcagcccgac ctgagcagca Saggtc rtccggcagr 

1.41 agggctacra caagatcgcg cgccactacc g^SSg^S" !!Sgcc«g gacrSrtcg 
1501 .«gc«ccc cgcggccgtg gt|gt|gagg l^^l^llll |?:|Lgt:g| ^gcgccrcgg - 
1561 agtacnttcg ggccacctat ccgctgccga *8S » caeecctgag ctgctctacc 

1801 ggcgggcctg catacgccct gagatctcaa 8" 8 6 cagtttgtgc 
1861 cgcacgggca g"cttrgac cagcacctca -|«-«-^ ^L^gaccg! gaL.ccncg 
1921 acctcaccca gctggacctg cctcacctgc SS * gaccggaagg 

1981 cccgcgccta cggtgctccc cagcrgcagg ^^^^H ^^^aaggct: Itcgccaagg 

2041 agctggggga gg^S=gSS^| crSa^cS HgSLgai agctggctac cggggratcg 
2101 ccctgggtgt tatggatgac cttaagccgg SS*^ B b ° Z reezagggct 
2161 tcacctccca gttccggggc cgccgtgtcc acctggcgcc cccaccgacg rggg ggg 

2221 atgatcctag ctggaat 

The DNA sequence of formula V corresponds to the coding 
region of human genomic DNA encoding GnT I. Another example 
of-a DNA sequence encoding human GnT I is a larger section 

SUBSTITUTE SHEET 



^ W PCT/CA9 1/004 17 

WO 92/09694 

- 12 - 

^■^rr rnT I Which has the formula 
of hmuan genomic DNA encoding GnT I, 

VI, shown below: ^act-tccaaa tattccctca tctccctggc 

1 aagttctgaa tgtrtaagtt "ctccccac gagcratttg tggacacgaa 

51 rtLgtaagc agggctt.ct "rccatg^ c.go.cagc -atta.cga 

121 ggctatccac cagtacatgc ^S*"""^ caaecaaaga taacagaaat agtrrgcCuC 
131 ftcct«cga gcr«ccagg -rattctca caagta^^g^ ,g,,,,ggct gc.rccncca 
241 ccttccacri: ccgcr«gaa ggcfacttcg ttttgctcct gacatccgcg 

301 gcaaaatrgtc aaacaaccct SSJg^^Satg tagccctgag gcgggaca.. 

361 gggcgcctcc ggcgccrccc cgttggtaag ^ % ggaacggcag gcagacgcrg 

'21 S!t:t«aaa aaccagt:cat -^"gSSS^S^ actSgagig gaaacgaaac crgacaggra 
111 ccac«c«g ccccrcccct: HHllolll cagcancact c.c.gcc.cc 

541 gaaagagggg agttgggg" <^«"^^!" tatgccgacC ccctagagag ctctggagcc 
Tol cccccaSL cacgrt:cccg gg«;|8^^^ ^gg^cnlrrc ccgcccccat: "ccrctacc 
661 aaccccctgg ccttccccca glcggrrtgc ttrcrcrctc ctgt:c«tag 

721 tgtggggcat ggagccacga g«"=S^?^ SaSaggca tccctaggac cgcgggcaag 
78?: gIgcSlgcr gcccccraat: -"^^I" grcclS|gc Cgccccccg gtSS|2f =J 
841 ggagccgcaa gcccagggca gcctrgaacc S cgccatccrc ttrgtggcct 

901 fgargc'gaa gaagcagcc. :rcgSfgc acc.ggcagg -accc-ag 

961 esaatgcccc gccgcccccc tccttccgga » | agtgartcgc ctggcccaag 

102^ ccagcjcrct cgacggcgac "Cgccagcc "--"^gg ^^^^^^ gatgccccgt 

iSsi acgcclaggc ggagc.ggag cg-gj^«^| f/c^.cccgc ccagccgcg. .t:gcccg.ga 
1141 cgagccagcg ggggaggg^g „atcgcctg cgaccgcagc actgcccggc 

1201 cccccgcgcc ggcggcgatr gctc«cccc ^"accgtra 

19 61 actzcccgga caagccgctg cactaccggc *^ ctcctacggc agcgcggcca 

1321 iSgglSg cgggcacgag gagacggccc aggcca.cgc egcaag«cc 
U81 IgcalLccg gcagoccgac ^^^S-^cagca iggccaggrc trccggcagt 

^441 alggctacta caagarcgcg cgccactacc ^^l^l gg^ggccocg g-"""^^ 
1501 ttcgctcccc cgcggccgcg S^gS^Sfagg J 8 * ctccctgtgg tgcgtcccgg 
1561 agtLr«cg ggccaccrar ^^^g^^^^^e tSac|ccag caggcctgag crgctctacc 
7621 cctggaarga caacggcaag gagcagatgg | cgagctcrgg gccgagcngg 

i si TcaSgacct: t:«ccccggc -gf c.^^^^^ Sfrfcg g!g|ccggag cagcggcagg 
1741 agcccaagtg gccaaaggcc "'^^SSS^^ eaacgatgac ctttggccgc aagggtgtga 
1801 g|cgggcct:g ca.acgccc. 82"::^ IgttLcLa gctgaaccag cagrttgtgc 
1861 cgcacgggca gttctttgac "8""^" aecgggaggc ctatgaccga gattrcctcg 
1921 acctcaccca gctggacctg SSalgt: gaggaccaat gaccggaagg 

1981 cccgcgtcra cggcgcccco cagcrg=-Sg |g |«caaggc. trcgccaagg 

2041 agctggggga ggtgcgggrg SgSccgag agctggctac cggggtarcg 

2101 ccctgggtgt catggatgao "^^^I^^^J IStggogcc cccaccgacg tgggagggc. 
2161 ecaccrccca g«ccggggo cg-g-|«^ ^..^Sclgg gcccc«c« g«---- 
2221 acgatcctag crggaattag ^ accggcctgc ctgtgcttcc ctcctaggcg 

2231 gagctgaggc gaccacagcc cccaggcrgc ^ ^^^^ caagagga« 

2341 SLrLSt: ..rga^rrr ccgag-g^- ^^^t^L. caggg.acg «S=S8gg« 
2401 atccrcccgr tcccaaggga g^=^8^"^| SItgggccc gtrggggcca caaatgtcca 
2461 «aagcagga aaacacrgtg ^ggrgggggS "^^|g^gg Lacgr^gc tctcrrgacc 
2521 cgccct:sagc cct:c««gg -g^^^^gca ^ J gccctcccrc nacacccgcr 
2581 agacccccrc cccctgaccg g««""? LaSegaca taccngtggc caaaacgaca 
2641 cLcrrccca grggggacrg -|«-^Sgga grggst:ca« ggggcccacc 

2701 craaccaaag gggcCCcc^ |^"S|8CCC S| 8^ |^^8 ^^^^^^^^^^ gcagcctagc 
2761 gcctccrgcc ccccnorcct ggggcaagca agacctctcc tcagcccatg 

2821 Ig^ratagt tci:gagat:gg !S|ccrtgr gctgggacaa ccrc.ccc« 

2381 cccagctgrc aggagagagg ^|"888^|| cclctccrtt ctgaaaarca gtgccccccc 
2941 gccccaccct: cagagaggac tacgccccga " aatccgaccr gccrgtcccr 

3001 Igcgcncca ggagg"-" iTclTzl ''^ IZIU^ aggcggagca gcgaccaggc 
3061 cccrcccctg ggg«t:gaca cacaggcc 

SUBSTsTUTE SHEET 



wo 92/09694 # • PCr/CA91/00417 

- 13 - 



3L21 ggagcagcga ccaggacgcc tccggcccag cgctgcccag cccccccgcc cgcccccagg 
3181 cgccccatgc cctcacaggc caggacgcca tggcggccgg gagcatgcga 

The DNA sequences of formulae V and VI have been 
obtained by cloning human genomic DNA encoding GnT I, by the 
procedure which is described in detail in the Examples 
section . 

Of course, it is to be understood that the present DNA 
sequences also include those which may not exactly match the 
sequences of formulae III-VI, but rather contain a small 
number of nucleotide substitutions, deletions, and/or 
additions. Further, the present DNA sequences also include 
those which encode for amino acid sequences which may not 
exactly match the sequences of formulae I and II, but rather 
contain a small number of amino acid residue substitutions, 
deletions, and/ or additions, provided that the protein 
encoded by the DNA sequence exhibits GnT I activity. 

In another embodiment, the present invention relates to 
plasmids which contain a DNA sequence encoding rabbit or 
human GnT I. Such plasmids may be prepared by conventional 
techniques and include plasmids formed by inserting one of 
the present DNA sequences into any suitable plasmid. 
Specific examples of the present plasmids include 
pGEM-7z-rcgntl, in which a 2.5 kb sequence of rabbit cDNA 
encoding for GnT I (Figure 2) has been inserted into 
pGEM-7z; pGEX-2t-rcgntl, in which a 2.5 kb sequence of 
rabbit cDNA encoding GnT I bas been inserted into pGEX-2t; 
and pGEM-5z-hggnti, in which a 4 kb sequence of human 
genomic DNA encoding GnT I has been inserted into pGEM-5z. 
The preparation of the plasmids pGEM-7z-rcgntl , 
pGEX-2t-rcgntl, and pGEM-5z-hggntl is described in detail in 
the Examples section, and all three of these plasmids have 
been deposited under the provisions of the Budapest Treaty 
with the American Type Culture Collection, 12301 Parklawn 

Drive, Rockville, MD 2 0852, USA on November 30, 19 9 0 

(Accession numbers not yet known) . 

In another embodiment, the present invention relates to 

transformed microorganisms which contain a heterologous 
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sequence of DNA encoding rabbit or human GnT I. Examples of 
suitable host cells including: bacteria, such as E, coli, 
Brevibacteria, and Coryneforms; fungus, such as Trichoderma 
reesei, Aspergillus niaer . and Asoercrillus avamori; yeast, 
such as Saccharomyces cerevisiae , Candida albicans, Candida 
utilis . Candida paraosilosis . Schizosaccharomyces pombe, 
Bandeiraea simplicif olia . Kluweromvces lactis, 
Saccharomvces kluweri , Hansenula . Saccharomycodes and 
Pichia ; and vertebrate cells such as Chinese hamster ovary 
cells and COS cells- The transformed cells may be prepared 
by transfecting the cells with any of the present plasmids 
by conventional methods. 

Another aspect of the present invention relates to 
methods for the production of GnT I, In a first embodiment, 
the present method comprises cell-free or in vitro 
expression of one of the present DNA sequences to obtain 
GnT T. For example, in vitro transcription and translation 
of one of the present plasmids using a system such as 
described in Methods in Molecular Biology , Kucleic Acids, 
Walker, ed., Humana Press, Clifton, NJ, pp 145-155 (1984) 
yields GnT I. 

In another embodiment, the present method comprises* 
culturing a microorganism which contains a heterologous 
DNA sequence which corresponds to one of the present DNA 
sequences. Although the culturing conditions, such as time, 
medium, temperature, light, and agitation, will depend on 
the identity of the host microorganism and the yield of 
GnT I desired, these conditions are readily determined by 
those skilled in the art. 

In a further aspect, the present invention relates to a 
method for converting a glycoprotein which is in the high 
mannose form to a glycoprotein which is in the form of a 
hybrid or complex N-glycan. In a first embodiment, the 
present method may be carried out by reacting, in vitro, a 
glycoprotein which is in the high mannose form with 
mannosidases followed by UDP-GlcNAc in the presence of 
GnT I. 
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in another embodiment, the present method may oomprise 
oulturing a cell which produces a glycoprotein in '^^^^ 
mannose form and which also contains a heterologous seguence 
of DNA encoding human or rabbit GnT I. For example, 
trans£ection of cell, which normally produces a glycoprotein 
in a mannose form, with one of the present plasmids may be 
used to fon. a cell which produces the protein (produced m 
high mannose form before transf ection, as - ''V'^"^ °^ 
complex H-gly=an. Preferably, 

produced in the high mannose form prior to transfect.on with 
the present DNA, is also produced by the host cell as a 
result of transformation. In other words, the ONA encoding 
the glycoprotein is also heterologous with respect to the 

host cell. . „ 

Examples of such glycoproteins are described .n Tanner 
et al, ^i ^^^^n. Biophys ica Acta; vol. 906, pp. 81-99 
(1987); and Kukurazinska et al, Ann Ppv. Biochem. ., vol. n5, 
pp 915-944 (1987) and include SUC 2, CSF, c-IgM ^-cham, 
c-IgM Chain, c-amylase, c-HBsAg, c-hexaagglutinxn, c-a, 
antitrypsin, c-prea., antitrypsin, c-glycoamylase, c-VSV gp, 
c-sindbis virus El yp, c-sindbis virus E2 gp, 
c-killerprotoxin (type I), c-phascolin a and ^, hepat.txs 
Virus surface antigen, interf eron-gaioina , tissue plasminogen 
activator, monoclonal anti-bodies, chicken ovalbumin- like 
proteins, interleukin-2 , and proteins from vesicular 
stomatitis, influenza, and Semliki Forest viruses. 

AS noted above, branched glycans on membrane 
glycoproteins have been implicated in a variety of 
biological phenomena, e.g. tumor progression and metastasis, 
embryogenesis, cell differentiation, cell-cell and 
receptor-ligand interactions, viral and bacterial 
infectivity, fertilization and the control of the immune 
system (Rademacher et al (1988) Ann Ppv. Biochem. , vol. 57, 
785-838; Pierce et al (1986) ,T Biol. Chem. , vol. 261, 
10772-10777; Yamashita et al (1935) J._.aioU-Cheitu , 
vol. 260, 3963-3969; Schachter (1986) Riorh^T. CpI 1 BloI. , 
vol. 64, 163-181; West (1986) MoT CpI 1 . Biochem. , vol- 72, 
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1273-1281; Dennis et al (198 '^.^^^^ ^ha conversion of 
cnT I cataly.es an complex H-glyoans 

^l,n mannose to "-^^^^'^^^^^ai^l.. vol- 64, 163-lsl. 
(Sohachter (19S6) SJ^^^^^^^^--^-^^^^^, vol. 66, 
LocK^ausen et al (L^B, S»;^;;;;;;;:;;ion Of t.e 
1134-llSl, . "fper resets m cnT X activity 

2.5 kb CDMA reported in th P ^^^^ catalytic 

demonstrating t^e ^^-^ °' 

domain of this important i„.olved in the 

least seven .^y-^^^^"^^;; cloned to date, 

synthesis of N- and -^^f ^^J^^^^^ferase (Appert et al 
UOP-Oal = alcKAC-^pi^--^^_ ,3,. ,,3-16S, 

O.Agcstaro et al =3^^— vol. 157, 

Masri et al (1^B8, ^i^^:^'-^:^^^^^!^^:^^^^' 
557-663; Narimatsu et al (l < jj2c._H3t_iiS3i- 

vol. 83, '"ZZ-rT: Shaper et al <1.S8, 

^_agj„ vol. 83, Katerawa et al (1988) 

vol. 263, '';,3.i6S, . aDP-Gal = aal-« 

.1,3-aal-transferase (•'°""^%!\^ 

.ol. .6., l""-"":,^::^^ Il/sen et al (1990, 0^ 

7265" ;o - 6" ;mit. et al ,1990) ^ .. 
Chem^, vol. 265, 7055 cMP-sialic acid:Gal-R 

.01. ;3rin et al (193.) 

.2,6-sialyltransferase Wein acid=Gal-R 

""/rse' i: - et al (1990, -1. 
„2,3-sialyltransferase (P ^^^^ 

4, A1S62,, =°^-^==°^^^i;!i 'e (Gersten et al (1990) 
.GloNA=)al,3(4)-Fuc-transferase^ PjSBtJU, 

vol. 4, '^f!"^,^ ,!:^.-„ansferase (Raian et 

.,1. 4, A1930), ^^^-^-^^'^^'-^'^^'d,,, 1115S-11167; Ernst 

et al (19B9) — — T^^lNAc to Gal) 

UDP-GalNAc:Fucal,2Gal K ^ j^Blol^ 
al,3-GalNAc-transferase (Yamamoto et 
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Chem, , vol. 265, 1146-1151). These transferases all place 
sugars in terminal or subterminal positions; three of them 
(/?1,4-Gal-, a2,6-sialyl-, and al , 3-GalNAc-transf erases) have 
been localized to the trans-Golgi cisternae and trans-Golgi 
network, at least in some tissues (Roth et al (1982) J ■ Cell 
Biol, , vol, 92, 223-229; Roth (1984) J. Cell Biol. , vol, 98, 
399-406; Roth (1987) Biochem, Biophvs. Acta., vol. 906, 
405-436; Roth et al (1988) Eur. J. Cell Biol. . vol, 46, 
105-112; Duncan et al (1988) J. Cell Biol . . vol. 106, 
617-628; Lee et al (1989) J, Biol. Chem., vol. 264, 
13848-13855; Tooze et al (1988) J. Cell Biol. , vol. 106, 
1475-1487; Berger et al (1985) Proc. Na t. Acad- Sci. USA, 
vol. 82, 4736-4739; Taatjes et al (1988) J. Biol . Chem,,., 
vol. 263, 6302-6309). Human al , 3-GalNAc-transf erase and a 
human pseudogene showing homology to murine al,3-Gal- 
transf erase share 55% homology (Laresen et al (1990) 
J. Biol. Chem. , vol. 265, 7055-7061). CMP-sialic acid:Gal-R 
a2,6- and a2 , 3 -sialyltransf erases exhibit 50% identity and 
80% conservation over a 50 amino acid stretch (Paulson et al 
(1990) FASEB J . , vol. 4, A1862). The remaining transferases 
share no significant sequence similarities but have very 
similar domain structures, i.e., a short amino-terminal 
cytoplasmic tail, a 16-20 amino acid transmembrane segment 
(non-cleavable signal-anchor domain) , a "stem" or "neck" 
region of undetermined length, and a long carboxyterminal 
catalytic domain which is in the Golgi lumen (Paulson et al 
(1989) J. Biol. Chem. , vol. 264, 17615-17618). 

The presence of a "neck" region is based on the finding 
that the a2 , 6-sialyltransf erase (Weinstein et al (1987) 
J> Biol. Chem. vol. 262, 17735-17743; Lammers et al (1988) 
Biochem. J. , vol. 256, 623-631) and the j31 , 4-Gal-transf erase 
(D'Agostaro et al (1989) Eur, J. Biochem. , vol, 183, 
211-217) can be cut by proteases to release a smaller 
catalytically active protein lacking the trans-membrane 
domain. The exact length of this "neck" region cannot be 
stated with accuracy since it is not known how much of the 
amino-terminal sequence can be removed without loss of 
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catalytic activity. It has been shown that rabbit liver 
GnT I (Nishikawa et al (1988) .T. Biol. Cham., vol. 263 , 
8270-8281) and rat liver UDP-GlcNAc: a-6-D-iiiannoside ^-1,2-N- 
acetylglucosaminyltransferase 11 (GnT II) (Bendiak et al 
(1987) .T, RiQl. Chem. . vol. 262, 5784-5790; Bendiak et al 
(1987) .T Rinl. Chem. . vol. 262, 5775-5783) exist in two 
forms, a large amount of presumably membrane-bound material 
which does not adhere to columns and a small amount of 
material which can be purified. In the case of GnT I, it is 
now clear from the sequence analysis that the 45 kDa form of 
the catalytically active protein previously purified has 
been derived from the membrane-bound precursor by 
proteolytic cleavage at about base position 215 in the 
"neck" region (Figure 4) . The N-terminal blockage of this 
45 kDa protein must therefore be due to chemical 
modification during GnT I purification. The hydrophobic 
trans-membrane region can form an a-helix with a hydrophobic 
surface capable of interacting with the membrane or with 
other hydrophobic proteins within the membrane. This strong 
hydrophobic interaction may explain why it is so difficult 
to purify glycosyltransf erase preparations with intact 
trans-membrane domains. 

Rabbit GnT I, human, mouse and bovine UDP-Gal:GlcNAc-R 
/31,4-Gal -transferases and human UDP-GalNAc: Fucal, 2Gal-R 
(GalNAc to Gal) al, 3-GalNAc-transf erase have an abnormally 
high number of Pro residues between the transmembrane domain 
and the catalytic domain, e.g., there are 13 Pro residues in 
GnT I between the transmembrane domain and base position 37 6 
(Figure 4) ; 9 of these Pro residues occur in a short stretch 
of 21 amino acids (bases 314-376, Figure 4). This Pro-rich 
"neck" may play a role in positioning the catalytic domain 
in the lumen of the Golgi to enable glycosylation of 
glycoproteins moving along the Golgi lumen. 

The domain structure of GnT I appears to be similar to 
that of the previously cloned glycosyltransf erases . 
However, GnT I differs from these transferases in being a 
medial-Go Igi enzyme, at least in some tissues (Dunphy et al 
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(1985) Cell, vol. 40, 463-472; Kornfeld et al (1985) Ann. 
Rev. Biocheni., vol. 54, 631-664). Although no medial-Golgi 
glycosyltransf erase has been cloned to date, rat liver 
a-mannosidase II (also a medial-Golgi enzyme) has been 
partially cloned (Moremen (1989) Proc. Natl. Acad. Set. USA , 
vol. 86(14), 5276-5280). Comparison with GnT I reveals a 
16-amino acid sequence in GnT I (LHYRPSAELFPIIVSQ, bases 
431-478, Figure 4) which shows a high similarity score to 
amino acid residues 403-418 in a-mannosidase II 
(LQYRNYEQLFSYMNSQ) . Paulson's group (Paulson et al (1989) 
J. Biol. Chem. . vol. 264, 17615-17618; Colley et al (1989) 
J. Biol. Chem., vol. 264, 17619-17622) has suggested that 
the trans-Golgi retention signal lies in the amino-terminal 
57 amino acids of the a2 , 6-sialyl transferase molecule. The 
16-amino acid "consensus" sequence present in GnT I and 
a-mannosidase II may be the equivalent medial-Golgi 
retention signal. Joziasse et al (1989) J. Biol . Chem. . 
vol. 264, 14290-14297, have suggested that a column 
hexapeptide sequence K(R)DKKND(E) may serve as a UDP-Gal 
binding site in the /31,4-Gal- and al, 3-Gal -transferases ; 
this sequence is not present in GnT I. 

Sequence data indicate that the carboxy-terminal half 
of human GnT I shows 87% nucleotide sequence similarity and 
90% amino acid sequence similarity to the carboxy-terminal 
half of rabbit liver GnT I. Strong homology between species 
has also been observed for bovine, murine and human 
UDP-Gal ;GlcNAc-R /31 , 4-Gal-transf erase (Appert et al (1986) 
Biochem. Biophv s. Res. Commun. . vol. 139, 163-168; 
D'Agostaro et al (1989) Eur. J. Biochem. . vol 183, 211-217; 
Masri et al (1988) Biochem. Biophvs. Res. Commnr. . , vol. 157, 
657-663; Narimatsu et al (1986) Proc. Nat. Acad. Sci. USA , 
vol. 83, 4720-4724; Shaper et al (1986) Proc. Nat. Acad. 
Sci. USA , vol. 83, 1573-1577; Shaper et al (1988) j. Biol. 
Chenu, vol. 263, 10420-10428; Nakazawa et al (1988) 
J. Biochem. (Tokyo) , vol. 104, 165-168) bovine and murine 
UDP-Gal :Gal-R al , 3-Gal-transf erase (Joziasse et al (1989) 
J. Biol. ChPm., vol. 264, 14290-14297; Larsen et al (1989) 
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Proc. Natl, an^rl. Sci. USA , vol. 86, 8227-8231), murine and 
human GDP-Fuc:Gal;01, 4 (3) GlcNAc-R (Fuc to GlcNAc) 
al,3(4)-Fuc-transferase (Gersten at al (1990) FASEB J. , vol. 
4, A1930; Kukowska-Latallo et al (1990) FASEB J . , vol. 4, 
A1930) , and human and rat CMP-sialic acid:Gal-R 
a2,6-sialyltransf erase (Lance et al (1989) Biochem. Biophvs. 
Res. Commun. . vol. 164, 225-232) . 

It has been reported (Kumar et al (199 0) Mol. Cell 
Biol. , vol. 9, 5713-5717; Ripka et al (1989) Biochem. 
Bioohvs. Re-':;- Commun. vol. 159(2), 554-560; Ripka et al 
(1990) J. Cellui?^T- Biochem. . vol. 42, 117-122) that 
transformation of Lec I Chinese hamster ovary (CHO) cell 
mutants (which lack GnT I) with a crude preparation of total 
human genomic DNA results in trans fectants expressing GnT I 
enzyme activity; this approach should allow cloning of the 
human GnT I gene by the gene transfer and expression 
screening method recently used to clone several 
glycosyltransf erases (Larsen et al (1989) Proc. Katl. Acad. 
sci. USA , vol. 86, 8227-8231; Larsen et al (1990) J. Biol. 
Chem. . vol. 265, 7055-7061; Smith et al (1990) J. Biol. 
Chem. . vol. 265, 6225-6234; Gersten (1990) FASEB J . , vol. 4, 
A1930; Kukowska-Latallo et al (1990) FASEB J . , vol. 4, 
A1930; Rajan et al (1989) t, Biol. Chem.. vol. 264(19), 
11158-11167; Ernst et al (1989) T. Riol. Chem., vol. 264(6), 
3436-3447) . 

Other features of the invention will become apparent in 
the course of the following descriptions of exemplary 
embodiments which are given for illustration of the 
invention and are not intended to be limiting thereof. 



EXAMPLES 

Rabbit: 



Preparation of Peptides. Rabbit liver GnT I was 
purified as previously described (Nishikawa et al (198 8) 
J. Biol. Chem. . vol. 263, 8270-8281). Glycerol, Triton 
X-100 and salts were removed from the purified enzyme 
•-(approximately 15 /ig) by "inverse-gradient" reversed-phase 
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high performance liquid chromatography (RP-HPLC) (Simpson et 
al (1987) Eur- J. Biochem. . vol, 165, 21-29). The enzyme 
solution (100 /il) was diluted to 1.2 ml with n-propanol in a 
sample-loading syringe, thoroughly mixed, and loaded at 1 
ml/min on a VeloSep Cg cartridge (3-ptm particle size, 30 x 
2.1 mm i-d.; Applied Biosystems, Foster City, CA, USA) 
previously equilibrated in 100% n-propanol at 40 "C. GnT I 
was retained on the reversed-phase column under these 
conditions whereas glycerol, Triton X-100 and salts were 
washed through the column with 100% n-propanol. GnT I was 
eluted at 0.1 ml/min as a sharp peak by a linear gradient 
(5%/min) of decreasing n-propanol concentration (100% to 
50%) generated with 100% n-propanol and 50% n-propanol/50% 
water containing 0.4% (v/v) trif luoroacetic acid at 40 "C. 
GnT I-containing fractions from the inverse gradient RP-HPLC 
were pooled, adjusted to 0.02% (w/v) with respect to Tween 
20 (Pierce Chemical Co,, Rockford, IL, USA), concentrated to 
100 111 in a 1.5-ml polypropylene tube using a centrifugal 
vacuum concentrator to reduce the n-propanol concentration, 
and diluted to 1.5 ml with 5% (v/v) formic acid containing 
0,02% Tween 20, 

Edman degradation of purified GnT I ('^ 200 pmol) 
yielded no N-terminal sequence indicating N-terminal 
blockage; proteolysis of GnT I was therefore undertaken. 
GnT I was digested with pepsin (Sigroa) at an 
enzyme/substrate mass ratio of 1:20 for 1 h at 37 'C and the 
digest was fractionated by RP-HPLC on a short microbore 
column (30 X 2.1 mm i,d,) employing a low pH 
(trif luoroacetic acid, pH 2,1) mobile phase and a gradient 
of acetonitrile to yield peptides 5 and 6 (Figure 1) , Core 
GnT I remaining after pepsin digestion was reduced with 
dithiothreitol and alkylated with iodoacetic acid (Simpson 
et al (1988) Eur. J. Biochem. , vol. 176, 187-197) to give 
core S-carboxymethylated(SCM) -GnT I which was purified by 
RP-HPLC (Simpson et al (1988) Eur, J, Biochem. . vol, 176, 
187-197; Simpson et al (1989) Anal. Biochem, . vol, 177, 
221-236) , Pepsin-treated core SCM-GnT I (about 10 /xg in 
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1 mi 1% ammonium bicarbonate, ImM CaCl,, .0.02% Tween 20) was 
digested with trypsin (Worthington) at an enzyme/ substrate 
mass ratio of L:20 for 15 h at 37 'C. RP-HPLC of the digest 
showed that trypsin resulted in little further digestion of 
the pepsin-treated material. Sequence analysis of a portion 
of this material resulted in 33 amino acid assignments 
(peptide 1, Figure 1) . Pepsin and trypsin- treated core 
SCM-GnT I (about 8 Mg in 1 ml 1^ ammonium bicarbonate-0. 02% 
Tween 20) was digested with thermolysin (Sigma) at an 
enzyme/substrate mass ratio of 1:20 for 2 h at 50' C and the 
digest was fractionated by RP-HPLC to yield peptides 2, 3, 
4, 7 and 8 (Figure 1). Core GnT I was extremely resistant 
to proteolysis even after reduction and alkylation 
indicating that the molecule is probably very compact. 

HPLC. RP-HPLC was carried out on a Hewlett-Packard 
liquid chromatograph (model 109 OA) fitted with a diode array 
detector (model 1040A) (Simpson et al (1988) Eur, J. 
Biochem. . vol. 176, 187-197). A Brownlee RP-300 column 
(30-nm pore size, 7-(im diameter dimethyloctylsilica 
particles packed into a stainless steel cartridge, 3 0 x 2.1 
mm i.d.; Brownlee Laboratories, Santa Clara, CA, USA) was 
used for all peptide separations. 

Amino Acid Sequence Analysis. Automated amino acid 
sequence analysis of GnT I and derived peptides was 
performed with Applied Biosystems sequencers (models 4 7 OA 
and 477A) equipped with on-line phenylthiohydantoin (PTH) 
amino acid analyzers (model 120A) . Polybrene (Klapper et al 

(1978) Anal. Biochem. , vol. 85, 126-131) was used as a 
carrier . 

Oligonucleotides and cDNA Synthesis. Oligonucleotides 
were synthesized on a Pharmacia automated oligonucleotide 
synthesizer at the Hospital for Sick Children-Pharmacia 
Biotechnology Service Centre. Total RNA was prepared from 
rabbit liver by the method of Chirgwin et al (Chirgwin et al 

(1979) Rinc-hemlstrv . vol. 18, 5294-5299; Ausubel et al 
(1990) Current Protocols in Molecular Biology , Media, 

PA: Greene Publishing Associates and John Wiley and Sons) . 
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Poly(A)+RNA was prepared by oligo(dt) chromatography (Aviv 
et al (1972) Proc. Natl. Acad. Sci . USA , vol. 69, 1408-1412) 
using the mRNA Purification Kit supplied by Pharmacia. 
Single-stranded cDNA synthesis was performed using the 
RiboClone cDNA Synthesis System (Promega) with the following 
modifications. Total rabbit liver RNA (20 fj.g) in a volume 
of 5.5 ^il was heated at 65 'C for 3 min followed by cooling 
on ice for 5 min. The following reagents were added to a 
final volume of 50 ^1:50 mM Tris-HCl, pH 3.3; 0.15 M KCl; 
10 mM MgClj; 2 mM dithiothreitol (DTT) ; each dNTP at 0.4 mM; 
40 units of RNasin (Promega) ; 2 mM sodium pyrophosphate; a 
mixture of the three anti-sense oligonucleotide primers 2A, 
3 A and 6A (Figure 1) at concentrations of 5 0 nM each; 20 
units of AMV reverse transcriptase and 15 units of murine 
leukemia virus reverse transcriptase. Incubation was at 
42 °c for 2 hr. The reaction mixture was treated with NaOH 
(0.25 N final concentration) for 5 min at room temperature 
to destroy RNA. The solution was then heated at 65 'C for 
1 min followed by cooling on ice for 5 min and neutralized 
with HCl (0.25 N final concentration). This cDNA 
preparation was used directly in the PGR reaction. 

Amplification of cDNA. pgr was carried out in a total 
volume of 0.1 ml containing 50 mM KCl, 10 mM Tris-HGl (pH 
8.3), 1.5 mM MgCl,, 0.01% gelatin, each of the four dNTP at 
0.2 mM, 0.5 MM of each oligonucleotide in six paired 
combinations of oligonucleotide primers (2S-3A, 2S-6A, 
3S-2A, 3S-6A, 6S-2A, 5S-3A, Figure 1) , 10 fxl of RNA-free 
rabbit liver cDNA (see above), 2.5 units of Thermus 
aquaticus (Tag) polymerase (Perkin-Elmer/Getus) and 0.1 ml 
of mineral oil. The samples were placed in an automated 
heating/cooling block (DNA Thermal Cycler, Perkin-Elmer) 
programmed for a temperature-step cycle of 94''C (0.5 min), 
50'C (1 min) and 72*C (2 min) for a total of 40 cycles 
followed by a lo-minute extension at 72 "G after the final 
cycle. DNA from the PGR reactions was purified with 
GeneClean (Bio 101, Inc.) and analyzed by electrophoresis in 
a 1% agarose gel containing ethidium bromide (0.5 ^g/ml) . 
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Two PGR products (0-45 and 0.50 kb) were detected and were 
purified from a 1% agarose gel by GeneClean, The DNA ends 
were filled in with T4 DN'A polymerase (Moremen (1989) Proc . 
^^atl. Acad. Sci. USA , vol. 86(14)^ 5276-5280) and the blunt 



recombinant plasmid was amplified in E- coli XLl-blue cells 
and purified. The plasmid was used for sequencing and to 
prepare a labelled probe for screening of a cDNA library. 

Screening- of rabbit liver cDNA library in AgtlO. The 
recombinant plasmid containing pGEM-7z and 0.5 kb PGR 
product (see above) was cut with BamHl and used to generate 
a riboprobe (0.5 kb) with the Promega Riboprobe Gemini II 
Core System. The reaction contained in a total volume of 
25 /xl:32 mM Tris-HGl, pH 7.5; 5 mM MgClj; 2 mM spermidine; 
8 mM sodium chloride; 8 mM DTT; 40 units RNasin; 0,4 mM of 
each of ATP, GTP and UTP; 5 /ilCa-^^P] CTP (8 00 Gi/mmole) ; 

1 Mg of BamHl-cut pGEM-7 2/PCR-product recombinant plasmid; 
and 2 units T7 RNA polymerase. Incubation was at 40 'C for 

2 hr. RNase-free DNase I (10 units) was added followed by 
incubation at room temperature for 15 min. Buffer (8 0 iil of 
50 mM Tris-HCl, pH 7 • 4 ; 4 mM EDTA; 300 mM NaCl ; 0.1% SDS) 
and tRNA (20 jug) were added followed by extraction with 
phenol-chlorof orm-isoamyl alcohol (25:24:1^ v/v) . The 
labelled RNA probe was desalted over a Sephadex G-50 column 
(Mick Coltimn^ Pharmacia) . 

A rabbit liver cDNA library in A.gt 10 (5 '-stretch, Cat. 
No TL 1006a from Clontech, EcoRI cloning site) was 
propagated in E> coli LE3 92 host cells and 10' plaques were 
screened by standard plaque hybridization techniques 
(Maniatis et al (1982) Molecular Cloning: a laboratory 
manual , Cold Spring Harbor, N.Y.rCold Spring Harbor 
Laboratory) using the above riboprobe. Following fixation 
of DNA to nitrocellulose membranes, the membranes were 
washed for 1 hr at 45*C in 50 mM Tris-HCl^ pH 8.0/1 M NaCl/1 
mM EDTA/0.1% SDS. Membranes were prehybridized at 50 "C for 
2 hr in IM NaCl/5 0 mM sodium phosphate, pH 6.5/0.1% SDS/50% 
f reshly-deionized formamide/1% glycine/0-5% Blotto/5 mM 



ends were ligated into SmocI site of pGEM-7 2 (Promega) . The 
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EDTA/1% yeast total RNA. Riboprobe (5 x lO*^ cpm/ml 
hybridization solution) was added and hybridization was 
carried out at 50 overnight. Membranes were washed in 
2XSSC/0-l% SDS twice for 5 min at room temperature and twice 
for 15 min at 50'C. Positive isolates were identified by 
autoradiography and were plaque-purified. DNA was purified 
from phage lysates, digested with EcoRI, and cDNA inserts 
were analyzed by agarose gel electrophoresis- The largest 
cDNA insert obtained was 1.6 kb ; it was subcloned into the 
EcoRI site of pGEM-7z (Promega) by standard methods 
(Maniatis et al (1982) Molecular Cloning: a laboratory 
manual , Cold Spring Harbor, N,Y.:Cold Spring Harbor 
Laboratory) and the recombinant plasmids were transfected 
into E. coli XLl-blue, Colonies containing the recombinant 
plasmid were selected and amplified, and plasmid DNA was 
purified by CsCl gradient centrif ugation (Ausubel et al 
(1990) Current Protocols in Molecular Biology , Media, 
PA: Greene Publishing Associates and John Wiley and Sons) . 

The cDNA library was re-screened as described above 
using a 80 bp riboprobe prepared from the 5 • -end of the 
1.6 kb clone. The largest cDNA insert obtained was 3.0 kb . 
This insert was sub-cloned into pGEM-7 2 as described above 
and plasmid DNA was purified by CsCl gradient centrifugation 
(Ausubel et al (1990) Current Protocols in Molecular 
Biology, Media, PA: Greene Publishing Associates and John 
Wiley and Sons) , to obtain pGEM-7z-rcgntl . 

DNA Seguencing. Two colonies of the pGEM-7z/ 
PCR-product recombinant plasmid (see above) containing 
inserts in opposite directions were sequenced directly by 
the single-strand dideoxynucleotide-chain-termination method 
(Sanger et al , Proc, Natl. Acad. Sci, USA , vol. 74, 
5463-5467) using deoxyadenosine 5 • - [a- [^^S ] thio] 
triphosphate, Sequenase (United States Biochemical) and the 
forward primer for pGEM-7z. The 1.6 and 3 . 0 kb clones were 
sequenced by the Erase-a-Base System (Promega) and the 
single-strand dideoxynucleotide-chain-termination method. 
Both DNA strands were sequenced by using colonies in which 
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the inserts were present in opposite directions. Plasmid 
DNA (12 fig) was cut with SphI to generate a 5 ' -overhang and 
Xbal to generate a 3 '-overhang. The cut DNA was digested 
with exonuclease III (Erase-a-Base System, Promega) for 
vair^ing lengths of time followed by SI nuclease digestion. 
The DNA ends were blunt-ended with the Klenow fragment of 
E. coli DNA polymerase I and the DNA was circularized with 
T4 DNA ligase. The ligation mixtures were transfected into 
competent XLl-blue cells. Miniplasmid preparations were 
carried out on about 5-10 subclones from each exonuclease 
III time point and were cut with BamHI and Aatll to 
determine DNA size. Colonies with appropriate deletions 
were amplified and incubated with M13K07 helper phage at 
37 °c for 1 hr followed by amplification in the presence of 
kanamycin (70 fig/vil) for 6 hr at 37 'C. Single-stranded DNA 
was produced by the helper phage and excreted into the 
medium. The ss-DNA was purified from the medium by 
polyethylene glycol precipitation and sequenced by the 
dideoxynucleotide chain-termination method using 
deoxyadenosine 5 ' - [a- ["S ] thio ] triphosphate , Sequenase 
(United States Biochemical) and the forward primer for 
pGEM-7z . 

RNA Hybridization. Rabbit liver poly(A)+RNA (5 ^J,g) was 
denatured in 50% (v/v) formamide/6% (v/v) formaldehyde 
buffer at 65 'C and was resolved by gel electrophoresis in a 
1% agarose gel containing 6% (v/v) formaldehyde. The RNA 
was transferred to a nitrocellulose filter and the filters 
were hybridized with the «P-labelled 0.5 kb PGR riboprobe 
(see above) followed by autoradiography. The specific 
activity of the probe was about 10* dpm/ng and the 
hybridization solution contained about 10* dpm/ml. 

In vitro transcription and transl ation. The 
recombinant plasmid containing pGEM-7z (Promega) and the 
2.5 kb GnT I cDNA insert (rc2500, Figure 2) (pGEM-7z-rcgntl) 
was cut with Sph I to generate linear plasmid. RNA was 
transcribed using the SPS RNA polymerase promoter and 
initiation site present in pGEM-7z. RNA synthesis was 
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carried out at 40^*0 for 1 hr in a total volume of 50 ^1 
containing 40 mM Tris-HCl (pH 7.5), 6 itiM MgCl^, 2 mM 
spermidine, 10 mM NaCl, 10 mM DTT, 40 units RNasin 
( Pr omega ) , 0,5 mM of each of ATP, UTP and CTP, 0.1 mM GTF , 
0.5 mM m'^GCS ' ) PPP(5 ' ) G (Pharmacia), 10 units SP6 RNA 
polymerase and 10 fig linearized plasmid. Control 
incubations were carried out in the absence of plasmid or 
with a linearized pGEM-7z recombinant plasmid containing a 
non-coding insert. The reaction mixture was extracted twice 
with phenol-chlorof orm-isoamyl alcohol (25:24:1, v/v) 
followed by precipitation with cold ethanol . 

Protein synthesis (translation) was carried out at 30 'C 
for 1 hr in a total volume of 50 ^1 containing all 20 amino 
acids (1 mM each) , 20 units of RNasin, RNA as prepared 
above, and buffer and rabbit reticulocyte lysate as supplied 
by Promega (Olliver et al (1984) "In vitro translation of 
messenger RNA in a rabbit reticulocyte lysate cell-free 
system", in: M. Walker J., ed. , Methods in Molecular 
Bioloav. Nuclei c Acids . Clifton, N.J.cHumana Press, 
14 5-155) . Non-radioactive amino acids were used when- the 
products of translation were assayed for GnT I activity (see 
below) . Separate incubations were carried out with 
L-C^^S] -methionine (1000 Ci/mmole; 90 ^Ci/incubation) 
replacing non-radioactive Met; these incubations were 
analyzed by SDS-polyacrylamide gel electrophoresis followed 
by autoradiography. 

GnT I was assayed (Schachter (1989) Methods Enzvmol, , 
vol. 179, 351-396; Brockhausen et al (1988) Biochem. Cell 
Biol^, vol. 66, 1134-1151) in a total volume of 40 /xl 
containing 2 0 mM MnCl^, bovine serum albumin (1 mg/ml) , 0.1% 
(v/v) Triton X-100, 0.1 M MES (pH 6.1), 0.5 mM UDP-N-[1- 
^^C]acetyl-D-glucosamine (2.2 mCi/mmole) , 0.125 M GlcNAc and 
0.6 mM Manal-6(ManQ:l-3)Man^-hexyl (a kind gift from Dr. Hans 
Paulsen, University of Hamburg, Hamburg, Federal Republic of 
Germany). Incubations were at 37 -C for 2 and 16 hr. The 
reaction was stopped with 0,5 ml 2 0 mM sodium tetraborate/ 2 
inM EDTA and was passed through a small column of AG1X8 
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(Cl-form, 100-200 luesh, equilibrated with water) to remove 
radioactive nucleotide-sugar . The eluate was applied to a 
Sep-Pak C-18 reverse phase cartridge (Waters) conditioned 
with 2 0 ml methanol and 2 0 ml water. The cartridge was 
washed with 2 0 ml water and radioactive product was eluted 
with 5.0 ml methanol (Palcic et al (1988) Glycoconiuquate 
J^, vol. 5, 49-63). An aliquot was counted directly and the 
remainder was analyzed by HPLC on a C-18 reverse phase 
column using acetonitrile-water (12:88) as the mobile phase 
(Schachter et al (1989) M^thnris Enzymol., vol. 179, 351-396; 
Brockhausen et al (1988) PinrhPTn. Cell Biol., vol. 66, 
1134-1151) . Product co-eluted with a standard preparation 
of Manal-6(GlcNAc/31-2Manal-3)Man^-hexyl at 36 min. 

Pr-^r,^r-^tion p^-^v-0-H-r-c.antl . This plasmid was 
prepared from pGEM-7z-rcgntl by cutting out the insert 
rcgntl with Eco RI. Plasmid pGEX-2t (Pharmacia) was 
linearized with Eco RI and the insert was ligated into the 
plasmid by standard procedures. The recombinant plasmid was 
amplified in E. coli in the presence of ampicillin and 
purified by cesixim chloride centrifugation. 

&n,r.1ification of cDNA. Three amino acid sequences 
(Figure 1) were chosen for the design of sense and 
anti-sense oligonucleotide primers to be used in the PGR 
amplification of rabbit liver cDNA. Deoxyinosine was 
substituted in positions where codon degeneracy was >2 
(Moremen (1989) ^^^q w^tl . Acad. Sci. USA, vol. 85(14), 
5276-5280) ; mixed pairs of bases were used in four positions 
in all three sequences giving a 16-fold mixture of sequences 
for every primer. Since we had no knowledge of the order of 
the peptides in the amino acid sequence, PCR was carried out 
with all six possible combinations of sense and anti-sense 
primers (2S-3A, 2S-6A, 3S-2A, 3S-SA, 6S-2A, 6S-3A, 
Figure 1) . The products of the PCR reactions were analyzed 
by agarose gel electrophoresis (Figure 3). Primer-dependent 
products were obtained with two of the six incubations, 
i.e., 2S-6A (500 bp) and 3S-6A (450 bp). The complete 
nucleotide sequence for GnT I is shown in Figure 4. 



SUBSTITUTE SHEET 



wo 92/09694 W ^ PCT/CA9 1/004 17 

- 29 - 

Oligonucleotide primers 2S and 3A are separated by only nine 
bases thereby explaining the absence of PGR product with 
this combination. 

Sequence Analysis. The 1.5 kfa clone contains 0 . 5 kb 
from the 3 '-end of the coding region and the full i.i kb 
3 ' -untranslated region (rcl600. Figure 2). The 3.0 kb clone 
yielded a 2485 bp sequence (rc2500. Figure 2; Figure 4). we 
have shown that subcloning of the 3 . 0 kb DNA fragment in 
pGEM-Vz results in deletion of a 0.5 kb DNA fragment near 
the 5 '-end of the clone. Comparison of the cDNA sequence 
shown in Figure 4 with the sequence of human genomic DNA for 
GnT I (in preparation) has shown that this deleted 0.5 kb 
DNA fragment is not part of the GnT I gene; we do not know 
the origin of this DNA. 

The GnT I coding sequence has 13 41 bp and codes for a 
membrane-bound protein of 447 amino acids (M.52,000). There 
is a single hydrophobic domain (bases 62 to 13 6) flanked by 
charged amino acids (Figure 4). Chou-Fasman rules (Chou et 
al (1978) Adv. Enzymol., vol. 47, 45-147) predict that this 
hydrophobic segment is capable of propagating an a-helix, as 
expected for a transmembrane domain. 

The presumptive initiation Met codon is at the ATG ' 
codon at position 50 which has an A at position 47 thereby 
fulfilling the requirements for an initiation codon (Kozak 
(1983) Microbiological Reviews , vol. 47, 1-45). All eight 
peptides shown in Figure l (a total of 103 amino acid 
residues) can be identified in the sequence (Figure 4) ; an 
additional five tentative assignments also match the 
sequence. GnT I purified from rabbit liver has a molecular 
weight of about 45 kOa (Nishikawa et al (1988) j. Biol. 
ChenK, vol. 263, 8270-8281). The protein has no N-glycans 
since none of the nine Asn residues are in a typical 
Asn-x-Ser(Thr) sequence; we have previously shown that 
rabbit liver GnT I binds poorly to lectin/ agarose columns 
(Nishikawa et al (1988) J. Biol. CheTn, , vol. 263, 
8270-8281) . If there are no or few 0-glycans, a 
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catalytically active protein of 45 kDa can be derived by 
cleavage at about base position 215 (Figure 4). 

Comparison of the GnT I sequence with those of several 
previously cloned glycosyltransf erases (Appert et al (1986) 
Blochem. Pinnhvs. R '-'^ - f^ommun.. vol. 139, 163-168 ; 
D'Agostaro et al (1989) Fvr .T. Riochem. , vol. 183, 211-217; 
Hollis et al (1989) RHof--hem. Biophys. Res. Commun. , vol. 
162, 1069-1075; Joziasse et al (1989) ,T. Biol. Chem. , vol- 
264, 14290-14297; Larsen et al (1989) Proc. Natl. Acad. Sci. 
USA, vol. 86, 8227-8231; Larsen et al (1990) ,T Biol. Chem. , 
vol. 265, 7055-7061; Masibay et al (1989) Proc. Natl. Acad. 
sci. USA , vol. 86, 5733-5737; Masri et al (1988) Bi°chem. 
Rinnhvs. Res- Commun. . vol. 157, 657-663; Narimatsu et al 
(1986) o^^^ ^r.f■. Ar ^H , 5;cl. USA, vol. 83, 4720-4724; Russo 
et al (1990) -T n^n^. Chem. . vol. 265, 3324-3331; Shaper et 
al (1986) P^no. Nat. . Sci. USA, vol. 83, 1573-1577; 

Shaper et al (1988) -t Rim. Chem.. vol. 263, 10420-10428; 
Shaper et al (1988) Biochemie^, vol. 70, 1683-1688; Shaper 
et al (1990) P^nc. Natl An^rl, Sci. USA, vol. 87, 791-795; 
Smith et al (1990) -t, Riol. Chem.. vol. 265, 6225-6234; 
Weinstein et al (1987) ,T. Biol. Chem., vol. 262, 
17735-17743) revealed no sequence homology but GnT I appears 
to have a domain structure typical of these enzymes (Paulson 
et al (1989) -T R-tqI. Chem. . vol. 264, 17615-17618) . 
Searches of the GenBanlc nucleotide data base (release 62.0) 
with the coding region of GnT I and of the PIR Protein Data 
Base (release 2 3.0) with the GnT I amino acid sequence 
revealed no significant similarities to other sequences. 

The complete sequence has a long 3 ■ -untranslated region 
(bases 1391-2479) containing the consensus polyadenylation 
signal AATAAA at position 2435 (Tosi et al (1981) Nucleic 
Acid^ Research , vol. 9, 2313-2323). Long 3 ' -untranslated 
regions are typical of the known glycosyltransf erase genes 
and may be a feature present in other Golgi-localized 
enzymes (Moremen (1989) Proc. Natl. Ar.ad. Sci. USA, vol. 
86(14) , 5276-5280) . 
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Northern Blot Analysis. The PGR riboprobe was used to 
determine the size of mRNA in rabbit liver. A major band 
was detected at about 3,0 kb with some smearing at lower 
molecular weights (data not shown) indicating that the 2.5 
kb cDNA clone (Figure 4) may not be full-length. 

In Vitro transcription and translation. Transcription 
of the linearized pGEM-7z/2.5 kb GnT I cDNA recombinant 
plasmid (pGEM-7z-rcgntl) followed by translation in the 
presence of L-[^^S]Met resulted in the appearance of a 
strong radioactive 52 kDa band on SDS-polyacrylamide gel 
electrophoresis; this band was not seen in control 
incubations lacking plasmid or containing control plasmid 
(Figure 5) . The molecular weight matches the prediction for 
the open reading frame shown in Figure 4. Table 1 shows the 
results of GnT I assays carried out on the 
transcription-translation incubations . The incubation 
containing the pGEM-7z/2,5 kb GnT I cDNA recombinant plasmid 
(pGEM-7 2-rcgntl) has appreciable GnT I activity whereas both 
controls show low activity. It is concluded that the 2.5 kb 
seguence shown in Figure 4 can code for the synthesis of 
catalytically active GnT I, 



TABLE 1 

In vit ro transcription-translation of rabbit GnT I cDNA 



Conditions of transcription 


GnT I product 
(nmoles/total transcription incubation) 




Sep-Pak assays 
2 hr 16 hr 


HPLC assays 
16 hr 


No plasmid 


0.04 0.21 




Control Plasmid 


0.04 0.21 


0.29 


2.5 kb GnT I cDNA 
(pGEM-7z-rcgntl) 


0.41 1.05 


1.32 
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II. Human GnT I: 

The polymerase chain reaction (PGR) was used to obtain 
a 0 . 5 kb ds-cDNA representing the carboxy terminal half of 
the rabbit liver GnT I coding sequence and labelled this DNA 
fragment by the random primer technique. The preparation of 
this probe is described above. 

The rabbit cDNA probe was used to screen 10^ plaques 
from an amplified human genomic DUA library in A.EMBL3 
prepared from chromosomal DNA from chronic myeloid leukemia 
cells. Positive plaques (23) were purified and phage DNA 
was subjected to restriction enzyme analysis using the 0.5 
kb rabbit cDNA as probe. All 2 3 preparations gave the same 
Sau3A 0.4 kb fragment. This fragment showed 87% base 
similarity and 9 0% amino acid sequence similarity to the 
rabbit GnT I carboxy-terminal sequence. Inserts of 13 and 
15 kb were cut from two of the human genomic DNA clones with 
SAII and subcloned into plasmid pGEM-5zf (+) (Pr omega ) . 
Restriction maps of the two inserts show that they represent 
an over-lapping 18 kb DNA sequence. 

The coding sequence was located in a 4 . 0 kb fragment of 
human genomic DNA by screening restriction maps with a probe 
containing the entire coding region of the rabbit GnT I 
cDNA- This 4.0 kb DNA fragment was cut out by restriction 
enzymes and subcloned into the sequencing vector pGEM-Szf (+) 
to yield pGEM-5z-hggntl and sequenced. Transfection of the 
gene into Lec 1 Chinese hamster ovary cell mutants (which 
lack GnT I activity) results in the expression of GnT I 
activity indicating the presence of a functional promoter 
5 '-upstream of the transcription start site. 

The 4 kb sequence contains an open reading frame coding 
for a protein with 445 amino acids (2 less than the rabbit 
enzyme) . The DNA contains a functional promoter and an 
intronless gene. The similarity between the rabbit and 
human enzymes is 85% for the nucleotide coding sequences and 
over 90% for the amino acid sequences. 

Obviously, numerous modifications and variations of the 
present invention are possible in light of the above 
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teachings. It is therefore to be understood that, within 
the scope of the appended claims, the invention may be 
practiced otherwise than as specifically described herein. 
The references cited in the specification are incorporated 
herein by reference. 
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amino a=id sequence of LEU TRP GLV ILE 

„ET LED LYS LYS GU. SER ALA 

.EU .HE VAL AI. ALA LE„ LEU L^^ 

PHO VAL PRO SEK A^ PH 
ASP PRO ALA SER LEO THR AKS jl^j. 

- - Z Z Z TRP VAL PRO .H. 

ARO al^ HXS HIS A^ LEU TRP ^ 

ALA PRO PRO ALA C« PR° VAL 
ALA VAL ILE PRO ILE LEO VAL iU. ^ 
ARC ARG CYS LE. ASP LYS LE. LEU HIS TYR AR 

- PHE PRO ILE IL^ - SE^ ^3 ILE ARC 

AL. OU, VAL ILE ALA SER 

GLN PRO ASP LEU SER ASN xi. j^^U 
.HE aL« OLY TYR .YR LYS I^ ALA ARC HIS ^ AR^ 
OLY OLN ILE PHE HIS ASN PHE AS« TYR P^ 
.LU ASP ASP ^ G« VAL ALA PRO ASP P^ P- 
AX. THR TYR PRO LEU I^ LYS ALA AS^ P^ 

SER ALA TRP ASN ASP ASN GLY LYS GLU G ^ 
■ ..s PRO GLU LEU LEU, TYR ARG T^ A- P^ PHK PK^ 

... L.U LEU LEU ALA GLU LEU ^ ^ ^ ARG 
PRO LYS ALA PHE TRP ASP ASP TRP ^^^^ 

- GLY ARG AL. =YS VAL ARG PR GL. ^ - 
PHE GLY ARG LYS GL^ VAL ^ HI 

LEU LYS PHE ILE LYS LEU AbN 

- - ^ =^ To G^^ lS :S VAL GLU LYS VAL 
Z Z Z IT. Z Z ..U G. G. VAL AR^ V. G. TYR 

- Tp Z S ^R Z Z Z Z Z GLY TYR ARG GLY 
Z Z Z Z Z The ARG GLY ARG ARG V^ HIS LEU AI. PRO 

.r-r. nrv TV-R ASP PRO SER TRP THR. 
PRO GI^ THR TRP ASP ^^^^^ nucleotide 
2. The DNA sequence of Claim x, n 
sequence of formula III: 

..atg ctg aag aag cag tct get ggg ^ 
etc ttt gtg gcc tgg aat gcc ctg ctg etc etc 
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r.r. eta ccq tea gac aat get etc gat gat 
gac cc: ,=c .gc «c .cc c,t 9 9 g 
,cc gag gta gag «g gaa gt ag egg gg 
agg gag cao c=at get ctt tgg g 

,.a gcc cc. ==. gc. =cg a. g.g c=^ _ 

get gtg ate eec ate etg g 

ege tgt tt, gae aa, e cat t ^gg 

Tet tit gge age g=a gte aea ca= ate egg 
gee eag gtc att g 

caa eet gae etg age aae att get gt, g 

gg= =aa ate tte ^^^^ „^ ,3, 

,ce act tae eea etg ttg aa gea g 
tet gee tgg aat gae aat gg g 

Z Z Z de t,g g=t gaa ct, gag eee aag tgg 

aa gee tgg ^at gae tgg atg eg= egg eet gag ca, ega 

e=e aaa gee tgg g g 

aag ggg agg g== tgt gtg ogt = g 

ttt gge egg aag ggt gtg age eat ggg eag tte ttt ^^g 
cte aag tte ate aag etg aae eag eag ttt g 

- n ::: ::: ::: ::: ::: ::: l gt, 
::: : : fae - - ::: ::: 

atg gat gac cte aaa tea ggt gta eee agg get gga tae egg gg 
at! 'te !ee tte tta tte eg, gge ege egt gte eae et, ,eg 
=ct eag aet t,g ,at gg= tat gat eet agt act _ 

3. The DNA se^aenea of Claim 1, having the nueleetxda 

sequence ef ^J" cteceetgtg ggggceag, 

gaattecge "^""^ " \„ ett ,t, etg tgg ggt get ate 

atg etg aag aag eag tet get ggg ^ 

Jo ttt gtg gee tgg aat gee etg =t, cte ete tte tte tgg a 
egt eea ,tg eet age agg etg ecg tea gae aat get etc gat g 
gae e=t gee age ete aee egt gag gt, ate c,e tta ,et eag gat 
Tee ,ag ,ta gag ttg gaa egt eag egg gga etg ttg eag cag att 
. agg gag eae cat get ett tgg age cag egg tgg aa, gtg cct aet 
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t atq cct gtg acc cca ccg cca 
gca gcc CC1: cct get cag ccg ca^ g 

get gtg ate ccc ate ctg g gag 
ego cgc tgt ttg gac aag c 

gcc cag gtc att get tec t gg ,,g 

caa cct gac ctg age aac att g 
ttc cag gge tac tac aag ate ^ 
ggc caa ate ttc eac aat tt 

gaa gat gat etc gag gtg gca 9 gtg 
gcc act tac cca ctg tt. aaa gca gac 

tct gcc tgg aat gac aat gg g,o 
.ag cca gag tta etc tac =gc a= 
tgg tta ctg ttg get gaa etc gg 
ccc aaa gcc ttc tgg gat gac tgg g 

aag ggg agg g== tgt gtg g cat 
ttt ggc egg aag ggt gtg ag ^^.^ 
etc aag ttc ate aag ctg aac cag g ^^^^ 
ctg gac ctg teg tac ctt cag cag gag gcc 

ctt get cgt gtt tat ggt get 

agg acc aat gac egg aag gag =ta gg^ 

aca ggc agg gae age ttc aag g 

atg gat gac etc aaa tea ggt g ^^c 
att gtc ace ttc tta ttc egg gge cge cgt g^^ 

cct cag act tgg gat gg ^^^^^^^gca arttcatgat craagacggg 

caacagorcc cgccrgcccc ^"SScc tcrrgggtcc 

accgtagccc otgggccgca «S«"^" Ilatgacaag grgaggg«c ttttgrraaa 
ccrLcc«r tgagrggcat ctgtSgg" "aagcagca aaccactgcg 

ggagtragac cagggaaagc acrctgctgt atgrtctgag «t«ctc« 

IScaeggga agaatgggcC ttttggggcc ^8"^^ ccaggccttt tctttcrgac 
aSrcSSg igaggagcc gfaac^"^^^ f,r,rcc«t "aSt:t:ccc craa.aaggg 
cfgagagcca gggcatl^^^^^^ -c^^^^^^^^^ ^^^8^^^^^,, „ ^^^^ 

TcSragrg 2fair4i ^.-^^^^^^^^^^ ^.cgg«ccaa 

cclarrLr gacrrccgcc agcccrtc« ^ |^ a«gggagaa -gg^-g^S 

gargaaaaan gaagaggaaa ^gaaacacrc |^ * cccccrcctr ccraaaaatc 
S2aaac« carcgcacca ^rcaaagag |^ agtrctgrgr gatccttcrt 

^cccccrcc ccgrcgcrrc -gS^SaaCgc ^^^^ 2| «ctggrggc cc.cctgaca 
ccocgagctt: taracacagg crccccccca agS^ S |S ^^^^gtocc gcccaccctc 
.:ScLag .ggccaagac caggacaac- ^/glf/cfgL 'gc.ggccgga gaggaan.tg 
nccLaacar tcccatgccc "acaggcta S| ^ cagtrtcacg gargaaaagr 

.g.g.gcgcg .grgcjcg^g c|Cg---^ ^^1^^/^,,,^ «,,.aaaaa aaaaaaaaaa 

aaaaaaccgg aattc. SUBSTITUTE SHEET 
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4 . An isolated DNA sequence encoding a- protein having 
the amino acid sequence of formula II: 

^E^ Zu LVS LVS GI^ SER AIA GLY I^U VAL I^U TRP GLV ALA ILE 
Z PHE VA. AI. T=.P AS« AI^ I^U LEU LEU PSE PHE TRP THR 

A^G p"o AEA PRO GLV ARG PRO PRO SER VAL SEP AI^ I^U ASP GL. 
"p PRO A^ SER LE. THR ARG GLU VAL ILK ARG LE. A^ GL. AS 
^lI GLU VAL GLU LEU GLU ARG Ai^G ARG GLY LEU LEU G« GU> ILE 
g" asp is leu SER SER GU. ARG GL. ARG VAL PRO THR ALA ALA 

PRO PRO tiA GU. PRO ARG VAL PRO VAL THR PRO AIA PRO ALA VAL 
L PRO XLE LEU VAL LLE ALA =.S ASP ARG SER THR VAL ARG A 

CYS LEU ASP LYS LEU LEU HIS TYR AEG PRO SER ALA GLU LEU PHE 

PRO ILE ILE VAL SER GLH ASP CYS GLY HIS GLU GLU THR AL^ GLN 
ILE A^: SER TYR GLY SER ALA VAL THR HIS ILE ARG GL. PR^ 

.3P LEU SER SER ILE AL. VAL PRO PRO ASP H AR^ L^S PHE GL. 

rrv TYR TYR LYS ILE AIA ARG HIS TYR AK^^ ir.jr 

V L PHE AR^ GL« PHE ARG PHE PRO ALA AL. VAL VAL VAL GL^ AS^ 
ASP LEU GLU VAL ALA PRO ASP PHE PHE GLU TYR PHE ARG ALA THR 
TYR PRO LEU LEU LYS ALA ASP PRO SER I^U TRP CYS VAL ER ALA 



TRP 


ASN 


ASP 


ASN 


GLY 


LYS 


GLU 


GLN 


MET ' 


GLU 


LEU 


LEU 


TYR 


ARG 


THR 


ASP 


PHE 


PHE 


LEU 


LEU 


ALA 


GLU 


LEU 


TRP 


ALA 


GLU 


LEU 


ALA 


PHE 


TRP 


ASP 


ASP 


TRP 


MET 


ARG 


ARG 


ARG 


ALA 


CYS 


ILE 


ARG 


PRO 


GLU 


ILE 


SER 


ARG 


LYS 


GLY 


VAL 


THR 


HIS 


GLY 


GLN 


PHE 


PHE 


ILE 


LYS 


LEU 


ASN 


GLN 


GLN 


PHE 


VAL 


LEU 


SER 


TYR 


LEU 


GLN 


ARG 


GLU 


ALA 


TYR 


ARG 


VAL 


TYR 


GLY 


ALA 


PRO 


GLN 


LEU 


GLN 


ASN 


ASP 


ARG 


LYS 


GLU 


LEU 


GLY 


GLU 


VAL 


ARG 


ASP 


SER 


PHE 


LYS 


ALA 


PHE 


ALA 


LYS 


ASP 


LEU 


LYS 


SER 


GLY 


VAL 


PRO 


ARG 


ALA 


THR 


PHE 


GLN 


PHE 


ARG 


GLY 


ARG 


ARG 


VAL 


5 . 


The DNA sequence of Claim 


4 , h 



sequence of fonnula V: 

a.8C.saa M»8"gc=c 8C.SSS="8 c|™= H^lHHl 

ssa.tgcccc gotgctcccc ctcctctgga cgcgcccagc ""SS"!* ^^.-.^ccaag 
„.gci=tc= cgacggcgac cccgccagcc Slarcglg 8aSc="8^ 

acgccgagg. ggag«|g^g cg=a|gc|.| ggccgc.gca g„ga„ggg g^^g^^^^^^ 

:i:r4:r=t ssns ^—^-^^ 
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pcccctcccc accaccgtra 

gccgcccgga caagccgccg = aggccatcgc <^tcccacgg ^ | ^^^^ 

iccaggactg cgggcacgag «gcgst:gcc g<^-8|^"^^ tfccg|cagt: 

'cgcacatocg ^gfclctlcc gctgggcgcc ||^-"|f^; gLtSr.cg 

agggctacta caagatcgcg atgaccCgga 88^88""! tgcgtctcgg 

tccgcccccc cgcggccgng g^H^II aggccgaccc ,|g|cccacc 
agcacccccg ga|cagargg -88-fcS cgfgc^'ss gc.gag^gg 

cccggaatga ^^^f % etgggctggc tS<^*S^^8!^ eLgccggag cagcggcagg 
gcaccgac« ^"^^J^cc trctgggacg actggatgcg ^8 S ^ g^g^ga 

agcccaagcg gccaaaggc gaacgaCgac ^ cagtttgtgc 

ggcgggcctg catacgccct 8-8^ ^^^, agrrcarcaa SC^J^^-S ^ 
cgcaogggca g^-^^Jg "Iraccrgo agcgggaggc c-^J-^i, |accggaagg 

=11 8ir8|S rg=i 

ffi^^ rg-^^fc Sr^c ^cafcga. .gg8agg8C. 
acgaccctag crggaar. ^^^.^^ nucleotide 

6. The DNA sequence of Claim 

sequence of formula VI: ^^_^^^aa tatttcctca ttrctctggc 

aag««gaa cgc«aag« ^^^^^^^ -"St. gagca-g C88a-^^^^^^ 
cnrcgcaagr aggg«rrcr ^a^ccacg S«gcrro 
ggccacccac "8--^^^^ r/aUca caagcaaaga "a-a8-^^^^ ^gf«c:l«ca 
ccc.cccrga g«^«^^|| tr«t:rc«c "ggttcatt tgcatrgg^^ |,,a«cgcg 
:2caSci: ggagacgacg 88caac«cg g.gggacatt 
gcaaaatgtr ^"""^ tettggtiaag gggttaactg ecagargctg 

ggg^gcccc IJ^f :Sgfg8Cgc -«88-t:a8^ 8|af4|::^ LgLagg.a 
cgattrtaaa aaccagcca acccggaggg ^aaacg ^^^tgcctct 

tcacrccttg ccccccccct ctocctccac cagcatcact J 

gaaagagggg ag«ggggcr ^^^^"^^^^ tatgttgact ccctagagag 
Lcccaaaaa "-8«cccg 88^-|^a^^ .gg.ertrrc crgc-ccac 
aaccrcccgg cctcccccca gacggrtcgc tzLggcaag 

.g.ggggcac ggagccac a g.™^ 8^^^^ .gcggg^^^S 

fgrgfcfcra fc: ajjs^^^^ i:-- sr^Sg 8j--^^ -ssrag . 

.cagcgcct csa^f sac - ^ ggotgccgca S^^tcgSr Irglccgrga 

acgccgaggr S8ag«||^| cccaccgcgg cccctcccgc ccagccg g 8^^ ^^^^g, 
cgagccagcg SSSSaggS 8 tcatcgcctg ^saccgc g 

cccccgcgcc SSCggtgatr cctcggctga g^tcttcccc 

gccgccrgga caagctgctg ^8^ ^gg«accgc crcc«cggc agcgcgg^^^ 

r-s =r« igi ^ 

.gt»cccc=« 88=""^" gl|c4a=85 tgs.'e"*^ "|!"cS| g=«a!"58 

--r 5Si =s sng « r4! 

s-cS -sss" bf-aS 
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. . rrraaetcRg gggtcccgag agccggccac cggggtaccg 
ccccgggtgc cacggatgac ""^S^^Sg |8S cccaccgacg tgggagggcc 

ccaccttcca gttccggggc <=S"S^g"^ tccttccCgg gccccttctt gccacatcat 
atgatcctag crggaactag cacccgcctg "cttcc gg 8 ctcttaggtg 
galctgaggr gaccacagtc cccaggctgc ^^-gS^^g^ caLrgataa caagaggact 
Llccatccc trtgattctc ccga|tgf a ttt-gt|^^ ^aggg.atg tcgcggggra 
atnctcccgt tctcaaggga g«agatcag gttggggcca caaatgtcca 

ccaagcagga aaacactgtg ^ggtgggggg ""^^g tctcttgacc 
cgccctgagc tttctcctgg ^^llllllll ^^Iglcacga gccctccttc tatacctgct 
agaccccttc tccctgactg g^tcttccag ^lllfj^ l^ Ltctgcggc caaaatga.a 
ccccttccca gtggggactg agttatggga g^*SS|g gcgggtcatc ggggctcacc 
ccaaccaaag gggcttcctt gtcagggccc 8g^gS^|^^S l^^^^^^^ gcagccragc 
gccrcctgcc ctrctotcct g«^|^"" ggggctlgca agaccccrcc rcagcccacg 
agcccacagc "cgagacgg -"g«|^;| iScct'gc gcrgggacaa cctctc.ctc 
cccagccgtc aggagagagg ^g"SgS^|| cc!ltccctt: ctgaaaatca gtgccctccc 
gccttacctt cagagaggac tatgccctga ^ aaCtcgatct gcctgtccct 

cgttgctcra ggaggcccct gctg|"///^ «fca|"4 agg.ggagca g.gaccaggr 
trttcccctg gggtctgaca "^^g|=^" tgctgLcag cctccccgcc cgctcccagg 

rgfargr/a icUgg gagca.gcga 
; A plas.id, comprising a DNA se^ence encoding a prote.n 

MET LEU L.S LYS GI^ SEH GLY LEU 

- i i b s » s s z s r r ™ 

Z S S S Z - ». 2; « - ™ S S ~ 

^^r. 7VTA TYR GLY SER ALA VAL inK nj.:^ j-^^ 

i E i i i i s s r s ~ » s - s 

ALA THR TYR PRO LEU LEU L.S AtA ASP PRO SEP LEU TRP CYS 
. Thk Z Z Z =LV VAL SER HIS C.Y CU, PHE PHE ASP CX. HXS 
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, ^TM PHE VM. PRO ™^ 
^„ ..S PHE XLE ..S ^ ^SH ,3P PHB 

^.U ASP L^U SER ^° =^ ^„ OU. VAi GLU L.S VKL 

..U .L. P^° = ™ 

.H. .SN .SP =^ „S .1^ ™^ 

THK Ml= ASP SER Pffi A^ GH 

.SP .SP X.. «3 S.^ G« X.. - - 

ILE VAL l^HR PHE I^" « 

PRO GI« THR TRP ASP GLV ™ ^^.^ ,.^ence has 

3. The plasmid of claxm 7, «h 

the formula 1X1= ^^t 9*9 "''^ 

atg =1:g aag aag ca, tct tgg a=a 

jc g^. 9- r/, s ga= aa. =- 

eg. cca g.g "'^ ato cg= «a get oag gat 

ga= cot g== ago cto a== ogt gag 9 9 

'gcc gag gta gag "g gaa cgt =ag g9 t 
,,g oao oat got ott tgg ag ooa 
gla go= oot oot got oag oog oat gtg 

got gtg ato ooo ato otg ^ att g 

CO ogo tgt ttg ga= aag ota otg a 
cig tto ooo ato att gtc ago oag g ogg 
gco oag gto att got too tat ggo a^ g 
!aa oot gac otg ago aao att got gt, 
tto oag ggo tao tao aa, ato goa ogg 
ggo caa ato tto oao aat tto aao 
gag 9at gat oto ga, gtg .=a ooa ga 

gco aot tao ooa otg ttg aaa g=a g .,t 
tot goo tgg aat gao aat ggo aaa g ^„ „e 



g=o tgg aat gao aat ggo aaa ga. 
aag ooa gag tta oto tao ogo aoa ga 
tgg tta otg ttg g=t gaa =t= tg9 9 

coo aaa goo tto tgg gat gao tgg ,ca 
aag ggg agg 9== ^^t 9tg ^ „t gao oag oat 

ttt ggo ogg aag ggt 9t9 ago oat ggg 
etc aag tto ato aag otg aa= oag g 

ctg gao otg teg tao ott oag oag 9 9 9^^ 

ctt get ogt gtt tat ggt 9=t =oo 9 

- n To - - - 

aca ggc agg gac ay 
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atg gat gac etc aaa tea ggt gta ccc agg get gga tae egg gge 
att gte aee tte tta tte egg gge ege egt gte eae etg geg eee 
cet eag aet tgg gat gge tat gat cet agt tgg act. 
5. The plasmid of Claim 7, wherein said DNA sequence has 

the formula IV: 

gaatteegge aagteatace tttgeetgee eteeeetgtg ggggeeagg 
atg etg aag aag eag tet get ggg ctt gtg etg tgg ggt get ate 
etc ttt gtg gee tgg aat gee etg etg ete etc tte tte tgg aea 
egt eea gtg cet age agg etg eeg tea gae aat get ete gat gat 
gae cet gee age ete aee egt gag gtg ate ege tta get eag gat 
gee gag gta gag ttg gaa egt eag egg gga etg ttg eag eag att 
agg gag eae eat get ett tgg age eag egg tgg aag gtg eet aet 
gca gee eet cet get eag eeg eat gtg eet gtg aee eea eeg eea 
get gtg ate eee ate etg gta att gee tgt gae ege age aee gte 
ege ege tgt ttg gae aag eta etg eat tat egg eet tea get gag 
etg tte eee ate att gte age eag gae tgt ggg eat gag gag aea 
gee cag gte att get tee tat gge age gea gte aea eae ate egg 
caa eet gae etg age aae att get gtg eag eee gae eae ege aag 
tte eag gge tae tae aag ate gea egg eat tae ege tgg gea ttg 
gge eaa ate tte eae aat tte aae tae eea gea get gtg gtg gtg 
gaa gat gat ete gag gtg gea eea gae tte ttt gag tae tte eag 
gee aet tae eea etg ttg aaa gea gae eee tee ete tgg tgt gtg 
tet gee tgg aat gae aat gge aaa gaa cag atg gta gae teg agt 
aag eea gag tta ete tae ege aea gat tte ttt eet gge tta gge 
tgg tta etg ttg get gaa ete tgg get gaa etg gag ecc aag tgg . 
eee aaa gee tte tgg gat gae tgg atg ege egg eet gag eag ega 
aag ggg agg gee tgt gtg egt eea gaa ate tea aga aea atg aea 
ttt gge egg aag ggt gtg age eat ggg cag tte ttt gae eag eat 
ete aag tte ate aag etg aae eag eag ttt gta eee tte aee cag 
etg gac etg teg tae ctt cag eag gag gee tat gae egg gat tte 
ett get egt gtt tat ggt get eee eag tta eag gtg gag aaa gtg 
agg ace aat gae egg aag gag eta gga gag gtg ege gta eag tae 
aea gge agg gae age tte aag get tte gee aag gcc etg ggt gte 
atg gat gae etp aaa tea ggt gta eee agg get gga tae egg gge 
att gte ace tte tta tte egg gge ege egt gte eae etg geg ccc 
cet eag aet tgg gat gge tat gat eet agt tgg aet 
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r-ctcccceca acctcacgac ctaagacggg 
caa=agct:cc tgocrgcccc "tcccrccc cctrgggrcc arrttr^rr 

accgcagtcc ctgggcrgca "S^""^^ SLgacaag grgagggrrc 
ctttcttttt: rgagtggcac "gaacacac ^ ^^aagcagca aaccacrgtg 

ggagctagat cagggaaagc atrtctgctgt =^8^ ^^^^g^g «tcrccctt: 

rIaLggiga agaacgggcc ""^SSgcc ^^^^ ccaggccttt rcrt:tct:gac 

aigtcatctg cagaggagt:t gS^"^"" Kccccttr ctaccrrccc ctaataaggg 
cSagagcca gggcacgaga ctt:crtg«c -^S^J ^^^^ .cacraacca gaggggcctc 
tctgggctac aggagaagtg aacatattgc g|" ^ ^rct gttcttcttt 

ac4Sagag t:craggt:gca S"-«|gS^ S"|clgc ct:a|caat:t:t ttgg^ctaa 
cc«a«lct: gacrtcrgrc agct:c«c« actgggagaa aggragtggg 

gacgaaaaat gaagaggaaa agaaatattc ^ cctccccctt tctaaaaatt 

LaLaactr catrgtacca -"-^^-^S^S ^^"^^^ |, agrrctgtgt: ga«crtci:r 
agccccctcc ctgrtgcttc ^gsagaatgc ^^^^g^lgc ttctggtggc cctcctgaca 
cLcgagttr ratacacagg ctcctcccra ^^S^^S gccaagtcct gcctacctrc 

Lag«Las ^ggccaagac caggacaact |«ggt:tgga gaggaat^g 

nccLaacat rcccacgccc «c«gal« cagrtrcarg S-t:gaaaagt 

.g.g.g.g.. Cg^^f/^ "aaa g|crgL«g tc.gaaaaaa aaaaaaaaaa 

ggaagctaca gaattacutt 

„aa.ac=.s a.ctc^ ^^^^^^..^^ , MX se^eno. encoding a protein 
10, A plasmid, comprising <x ^ la II- 

..avin, ™l tkp gl. ai* ii^ 

MET LEO LYS LYS GLK SEE &IA 

- - - - :^ To z z Te. 

AEO PRO AIA PRO GLY *EG PRO PR 

.SP PRO AX. SER I^U THR ARG 0^ VAL AR ^ 

ALA OU, VAL GLU GLU ARG ARG ARG GL ^ 

OL. ASP AXA SER SER OZM ARG GLY A^ V .^^^ 

P.O PRO AX. GX« PRO ARG VA. P^O VA. THR PR 

ir. PRO ^^^-^Z^ZZ PRO SER AIA GI.U LEtT PHB 
CYS I^U ASP LYS HIS TYR AR 

PRO XI^ IX^ VAL SER Gn« ASP =YS 
ALA ILE ALA SER TYK GLY SER ALA VA^ 

.SP X^ SER SER XXE ALA VAL PRO ^ ^ ^ ,LY GX. 

OLY TYR TYR LYS ILE ALA ARG HIS lYR ^ 

- - ARG GLH P^ ARG PHE P^ A^ - ^ 
ASP X^ GLU VAL ALA PRO ASP ^ ^ ^ ^„ ALA 

-O X^ LB^ LYS ALA ASP ^ 
TRP ASH ASP ASH GLY L^ G^ ^ 
OLU LEU LEO TYR ^ ^ ^ ,,0 LYS TRP PRO LYS 

LEO AIA GLU LEO ^ ^ ARG GI^ GLY 

AL^ PHE TRP ASP ASP ^ ^ 

Z Z Z Z L GX. PHE PHE ASP GX. HIS LEO LYS 
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PHE 


ILE LYS 


LEU 


ASN 


GLN 


GLN 


PHE 


VAij 


rlXo 


PHF 

X 11 i— 1 


THR 


GLN 


LEU 


ASP 


LEU 


SER TYR 


LEU 


GLN 


ARG 


GLU 


T "A 

ALiA 




r\0 XT 


ARG 


ASP 


PHE 


LEU 


ALA 


ARG 


VAL TYR 


GLY 


ALA 


PRO 


GLN 


I.EU 


GLN 


VAL 


GLU 


LYS 


VAL 


ARG 


THR 


ASN 


ASP ARG 


LYS 


GLU 


LEU 


GLY 


GLU 


VAL 


ARG 


VAL 


GLN 


TYR 


THR 


GLY 


ARG 


ASP SER 


PHE 


LYS 


ALA 


PHE 


ALA 


LYS 


ALA 


LEU 


GLY 


VAL 


MET 


ASP 


ASP 


LEU LYS 


SER 


GLY 


VAL 


PRO 


ARG 


ALA 


GLY 


TYR 


ARG 


GLY 


ILE 


VAL 


THR 


PHE GLN 


PHE 


ARG 


GLY 


ARG 


ARG 


VAL 


HIS 


LEU 


ALA 


PRO 


PRO 


PRO- 


11 . 


The plasmid of 


Claim 10, wherein said 


DNA 


sequence 


has 


the 


formula 


V: 

























,,::r-t i"si:r.! =r" ::i 

S««cr" cUUscS" cccgccagcc tcacccgjs. agtgattcgc "ggccc.ag 

d:i?g lb SI is: nr4S? s p 

ccccceclcc Kclgtlacc cccaccccgg tcatcgcctg tgaccgcagc actgttcggc 
ectecctlga c!a|cticrg catcatcggc cctcggctga gctcttcccc atcatcgtta 

ii SiSH :=H iisis lii ii 

JScrrccc cgcggccgtg grggtggagg argacctgga gg^gg""^ Scgtctcgl 
aetaccttce egccacctat ccgctgctga aggccgaccc crccctgtgg tgcgtcccgg 
cctggaatg! cLcggcaag gagcagatgg tggacgccag caggcctgag "8="^^^^ 
ecaccgac« tttcStggc ccgggctggc tgctgcrggc cgagctctgg g«g^g=^Sg 
fgccclagrg gccaaaglcc ctcliggacg actggatgcg gcggccggag cagcggcagg 
g|=gggt«| catacgScr gagatctcaa gaacgatgac cctrggccgc "gggtgtga 
cLacLgca gccccttgac cagcacctca agtcratcaa gctgaaccag "g«^g^S^ 
acctcScca Ictggacctg tcltacctgc agcgggaggc ctatgaccga gatttccrcg 
cccgcgtcta lgg?|ctccc cagctgcagg tggagaaagt gaggaccaat gaccggaagg 
aectglggga g|tg?gggtg cagtatacgg ggagggacag cttcaaggct "cgccaagg 
c?ct|St|" St|g2|ac c«aagtcgg gggtrccgag agctggctac =gggg""g 
c!accScca gctccgglgc cgccgcgtcc acctggcgcc cccaccgacg tgggagggct 
atgatcctag crggaat . 

12. The plasmid of Claim 10, wherein said DNA sequence has" 
the formula VI: 

aaetttteaa tgtttaagtt tatttaagct tatttctaaa tattttctca tttctctggc 
"SniSt aSgrrtlct: carccatgtr «cri:crcat gag«ar«s tggaratgaa 
ggcrlrcSc Sllaratgr tgatrtttat attacac«c ctcgctcagr "a«a«ga 
Scccrrrea gttrtccagg cacartctca caagtaaaga taatagaaat agcctgcttc 
ctttccIcS ctgcrtrga! trtrcttttc ttggttcatr cgcattggct gcttcctcca 
gcaaaarg« aaLaaccct ggagatga.g ggcaacttcg ttttgcccct g-attcg.g 
Lztecctct ggtgctrccc tgttggtaag gggtcaactg tagccctgag gtgggacatt 
rScStaaa Stcagtcat cccggggcgc ttaggttaga ggaarggtag gcagatgcrg 
rSccccttg ccccrcccct cctccttccc acctggaggg gaaatgaaat ctgacaggra 
!alagagg2 agtrggggrt. ctttttctct ctccctccac cagcatcact ctctgcctct 

gg^aggata tatgrcgac ccc.agagag c"^gg^g« 
aaccccctEg ccctcctcca ccctcaccct tggcccttcc ccgcccccat ttcctctacc 
c^rggggll? ggagccacga gcctrcgrgi: gacggt:tt:gc tttctctctc ctgtct^ag 
gCgcSLct icctcctaat cccatagtcc agaggaggca tccctaggac tgcgggcaag 
LLccgcaa gcccagggca gcctrgaacc gcccccrggc ctgccctccg gtgggggcca 
Sa?gc?gaa |aagc!|cct icagggcrtg tgctgtgggg cgctaccctc cctgtggccr 
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..«cccagc acccggcagg ccaccctcag 
ggaacgccct: gccgcccc.c Xf^^o^i^ agrga«cgc ^^^^^ 

~- ESS Sris ""-S S E i= 

cgagccagcg ggggaggg^S ^^ratcctee ccatcgcctg tgaccgcis ° ^ta 

cccccgcgcc ggcggtgatt -"^^^^^c ccccggcrga gctcCtcccc 

:::ri-s issx r.s r. ssss === Sb 

iiiiissiiigi55£f| 

!bsr=i rr." sS"-, 

°aS'2=c. |«gs««g 'SULil- gaggacoac 

cccgc8tc« c8Etsc«cc gPgls.cag Ug|g«t?g 

=s s» «:a-g n gisir. 

aceatcctag ccggaatrag cacccgcctg ^ ctgtgtctoc ^^'^"^Sgtg 

Sctgaggt gaccacagtc occaggctgc caaatgataa caagaggatt 

SSiSc W««r -g^|2Sg ggg"ct!« ct:aggg«.g -^^^^^^ 
a«ctcccgt tctcaaggga SJtgggcrt g«ggggcca caaacgtcca 

SS ~-ssn gg s.; k 
SEl Sss -gss ssii Ss'sSt sss 

gccccccgcc cccctctccc ggggcaagca agacccctcc ^"8""^^ 

!gt:t:tai:agc .otgaga.gg --|«|tg| SSccttgr gctgggacaa cc«««^r 
cccagctgtc aggagagagg tgcagggags * ctgaaaatca gtgcccccc 

gcctLcctr cagagaggac "tgccctga aattcgarct gcctgtccct 

rgcrgcrcra ggaggcrcct gctggcrtgg aggrggagca gt:gaccaggc 

cLcLcCg gggrrrgaca ca-ggf « cc.ccccgco cgc.cccagg 

ggagcagtga ccaggac|cc J tggcggccgg gagcatgcga, 

cgcccca.g. — containing a heterologous sequence 

,3. . t.e a.ino acid se^ence o. 

of DNA encoding a P^"*- 

formula I. wherein said 

14 The transformed cexx 

Has the formula III- 

15. me uj-ai formula IV- 

n...roXo,cus OK. se^-=. .a ^^^^^ a .etercXc,=us serene. 

' "^TnH P-tiin having tne a»ino acid se<5uence of 
of DNA encoding a 

formula 11- ^ wherein said 

1,7. The transformed cell 
].eterologous DNA sequence has the 
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18 



The transformed cell of Claim 16, wherein said 
heterologous DNA sequence has the foirmula VI. 
19- A method for preparing a glycoprotein which is a 
complex or hybrid N-glycan, comprising: 

culturing a cell which produces a precursor 
high-mannose glycoprotein and which contains a heterologous 
DNA sequence which encodes a protein having the amino acid 
sequence of formula I. 

20. The method of Claim 19, wherein said heterologous DNA 
sequence has the formula III. 

21. The method of Claim 19, wherein said heterologous DNA 
sequence has the formula IV. 

22. A method for preparing a glycoprotein which is a 
complex or hybrid N-glycan, comprising: 

culturing a cell, which produces a precursor 
high-mannose glycoprotein and which contains a heterologous 
DNA sequence which encodes a protein having the amino acid 
sequence of formula II. 

23. The method of Claim 22, wherein said heterologous DNA 
sequence has the formula V. 

24. The method of Claim 23, wherein said heterologous DNA 
sequence has the formula VI. 
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Pet3tide 1: 

1 10 20 30 

WALGQIFHNFNYPAAVVVEDDLEVAPDFFEYfq 

Peptide 2: 

1 10 

L WAELEPKU P K a 

Peptide 3 : 

1 10 
F y D P W M R R P E Q 

Peptide 4: 
1 

T D F F P e 
Peptide 5: 

1 10 
DLSYLQQEAYDRDFl 

Peptide 6: 

1 10 20 

LFRGRRVHLAPP OTffDGYDP S W t 

Peptide 7: 
1 

L G L 

Peptide 8: 
1 

A T Y P L 
Oligonucleotides : 

2S: 5'-TGG GCI GAA CTI GAA CGI AAA TGG-3 ' 

G T G G 

2A: 5 '-CCA TTT IGG TTC lAG TTC IGC CCA-3 ' 
C C AC 

3S: 5' -TTT TGG GAT GAT TGG ATG CG-3 ' 
C C C A 

3A: 5'-CG CAT CCA ATG ATC CCA AAA-3 ' 
T G G G 

6S: 5'-CAA ACI TGG GAT GGI TAT GAT CC-3 ' 
G C C C 

6A: 5'-GG ATC ATA ICC ATC CCA IGT TTG-3 ' 
G G G C 
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FIGURE 2 
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1 gaattccggc aagtcatacc ttcgcccgcc ctcccctgtg ggggccagg 



50: 



atg ctg aag aag cag tct get ggg ctt gtg ctg tgg ggt get ate 

MET LEU LYS LYS GLN SER ALA GLY LEU VAL LEU TRP GLY AT^ TT.F. 

95: etc ttt gtg gee tgg aat gee ctg ctg etc etc ttc ttc tgg aea 

LEU PHE VAL ALA TRP ASN ALA LEU LEU LEU LEU PHE PHE TRP THR 



140: 



185 



cgt cea gtg cet age agg ctg ccg tea gac aat get etc gat gat 
ARG PRO VAL PRO SER ARG LEU PRO SER ASP ASN ALA LEU ASP ASP 

gac ect gee age etc ace cgt gag gtg ate ege tta get eag gat 
ASP PRO ALA SER LEU THR ARG GLU VAL ILE ARG LEU ALA GLN ASP 

230: gee gag gta gag ttg gaa cgt cag egg gga ctg ttg cag cag att 
ALA GLU VAL GLU LEU GLU ARG GLN ARG GLY LEU LEU GLN GLN ILE 

275: agg gag cae cat get ctt tgg age cag egg tgg aag gtg ect act 
ARG GLU HIS HIS ALA LEU TRP SER GLN ARG TRP LYS VAL PRO THR 

320: gca gee ect ect get cag ccg cat gtg ect gtg ace cea ccg cea 
ALA ALA PRO PRO ALA GLN PRO HIS VAL PRO VAL THR PRO PRO PRO 

365: get gtg ate cec ate ctg gta att gee tgt gac ege age ace gtc 
ALA VAL ILE PRO ILE LEU VAL ILE ALA CYS ASP ARG SER THR VAL 

410: ege ege tgt ttg gac aag eta ctg cat tat egg cet tea get gag 
ARG ARG CYS LEU ASP LYS LEU LEU HIS TYR ARG PRO SER ALA GLU 

455: ctg ttc cec ate att gtc age eag gac tgt ggg cat gag gag aea 
LEU PHE PRO ILE ILE VAL SER GLN ASP CYS GLY HIS GLU GLU THR 

500: gee cag gtc att get tec tat ggc age gca gtc aea cae ate egg 
ALA GLN VAL ILE ALA SER TYR GLY SER ALA VAL THR HIS ILE ARG 

545: caa ect gac ctg age aac att get gtg cag cec gac cae cgc aag 
GLN PRO ASP LEU SER ASN ILE ALA VAL GLN PRO ASP HIS ARG LYS 

590: ttc eag ggc tac tac aag ate gca egg cat tac cgc tgg gca ttg 
PHE GLN GLY TYR TYR LYS ILE ALA ARG HIS TYR ARG TRP ALA LEU 

ggc caa ate ttc cae aat ttc aac tac cea gca get gtg gtg gtg 
GLY GLN ILE PHE HIS ASN P H£ ASN TYR PRO ALA ALA VAL VAL VAL 

680: gaa gat gat etc gag gtg gca cea gac ttc ttt gag tac ttc cag 
GLU ASP ASP LEU GLU VAL ALA PRO ASP PHE PHE GLU TYR PHK HT.N 

725 



635 



770: 



gcc act tac cea ctg ttg aaa gca gac cec tec etc tgg tgt gtg 
ALA THR TYR PRO LEU LEU LYS ALA ASP PRO SER LEU TRP CYS VAL 

tct gee tgg aat gac aat ggc aaa gaa cag atg gta gac teg agt 
SER ALA TRP ASN ASP ASN GLY LYS GLU GLN MET VAL ASP SER SER 

815: aag cea gag tta etc tac cgc aea gat ttc ttt cet ggc tta ggc 
LYS PRO GLU LEU LEU TYR ARG THR ASP PHK PHE PRO GLY LEU GLY 
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860: tgg tta ccg ttg get gaa etc tgg get gaa etg gag ecc aag tgg 
TRP LEU LEU LEU ALA GLU LEU TRP ALA^ GLU J^U_ GLU JRO^I^S TR^ 

905: ccc aaa gee ttc tgg gat gac tgg atg cgc egg cct gag cag cga 
PRO LYS ALA PHE TRP ASP ASP TRP MET ARG ARG PRO GLU GLN ARG 

950: aag ggg agg gee tgt gtg egt eea gaa ate tea aga aca atg aca 
LYS GLY ARG ALA CYS VAL ARG PRO GLU ILE SER ARG THR MET THR 

995: ttt ggc egg aag ggt gtg age eat ggg cag ttc ttt gae cag cat 
PHE GLY ARG LYS GLY VAL SER HIS GLY GLN PHE PHE ASP GLN HIS 

1040: etc aag ttc ate aag etg aae cag cag ttt gta ccc ttc ace cag 
I^U LYS PHE ILE LYS LEU ASN GLN GLN PHE VAL PRO PHE THR GLN 

1085: etg gac etg teg tac ctt cag cag gag gcc tat gac egg gat ttc 
LEU ASP LEU SER TYR LEU GLN GLN GLU ALA TYR ASP ARG ASP PHE 

1130: ctt get cgt gtt tat ggt get ccc cag tta cag gtg gag aaa gtg 
LEU ALA ARG VAL TYR GLY ALA PRO GLN LEU GLN VAL GLU LYS VAL 

1175: agg ace aat gac egg aag gag eta gga gag gtg cgc gta cag tac 
ARG THR ASN ASP ARG LYS GLU LEU GLY GLU VAL ARG VAL GLN TYR 

1220: aca ggc agg gac age ttc aag get ttc gee aag gcc etg ggt gtc 
THR GLY ARG ASP SER PHE LYS ALA PHE ALA LYS ALA LEU GLY VAL 

1265: atg gat gac etc aaa tea ggt gta ccc agg get gga tac egg ggc 
MET ASP ASP LEU LYS SER GLY VAL PRO ARG ALA GLY TYR ARG GLY 

1310: att gtc ace ttc tta ttc egg ggc cgc cgt gtc cac etg gcg ccc 
ILE VAL THR PHE LEU PHE ARG GLY ARG ARG VAL HIS LEU ALA PRO 

1355: eet cag act tgg gat ggc tat gat cct agt tgg act 
PRO GLN THR TRP ASP GLY JTYR ASP PRO SER TRP THR 

1391 taacagctcc tgeetgtcec ttetgggctc cttccttgca atttcatgat etaagatggg 

1451 aecgtagtcc etgggctgea ttgtetttte tgtctttcce tcttgggtce attttttttt' 

1511 ttttcttttt tgagtggcat ttgaatacae agatgacaag gtgagggtte ttttgttaaa 

1571 ggagttagat cagggaaagc attctgctgt ctgttgggta tcaagcagca aaccactgtg 

1631 tgatagggga agaatggget ttttggggcc agaaatatcc atgttctgag tttttctett 

16 91 aggtcatetg cagaggagtt ggcaacttta getttcttaa ceaggccttt tetttctgae 

1751 ctgagageca gggcatgaga cttcttgtte atgctccttt ttaccttece ctaataaggg 

1811 tctgggetac aggagaagtg aacatattgt ggccagaata atactaacca gaggggcctc 

1871 attgteagag tetaggtgea gttattgggt tgtcagagtt aatgcettet gttcttcttt 

1931 ccttattcet gacttctgtc agctettctt tctttgcagc ctagcaattt ttggttctaa 

1991 gatgaaaaat gaagaggaaa agaaatatte geacccagct attgggagaa aggtagtggg 

2051 aaaaaaaett cattgtacca ctteaaagag acactettga cctctteett tctaaaaatt 

2111 agtecectec ctgttgcttc aggagaatgc tgtgctggtc agttetgtgt gatccttett 

2171 ccctgagttt tatacacagg etcctcccta aggctgtggc ttetggtggc cetcetgaca 

2231 taagttacag tggeeaagac caggacaact ecggecatga gctaagtcet gectaccttc 

2291 tceaaaaeat teccatgtec teacaggcta ggatgcagat gttggttgga gaggaatttg 

2351 tgtgtgtgtg tgtgtgtgtg tgtgttttct tgcctgaeet eagtttcatg gatgaaaagt 

2411 ggaagctaea gaattatttt caaa aataaa ggctgaattg tetgaaaaaa aaaaaaaaaa 

2471 aaaaaacegg aatte 
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I aagttctgaa cgcccaagtt tattcaagtt tatctccaaa tatttcctca tttctctggc 
61 ttctgcaagc agggtttcct catccatgtt tccttctcat gagctatttg cggatatgaa 
121 ggccatccat ragtatacgt tgatttttat attacacttc cttgctcagt tcatrattga 
181 ttcttttcga gctttccagg catactctca caagtaaaga taaragaaac agtttgctcc 
241 ctttccactt ccgctctgaa tcrcrctttc ttggcccact cgcactggct gcccccccca 
301 gcaaaatgtc aaataaccct ggagatgatg ggcaactrcg tcttgctcct gacactcgtg 
361 gggtgcctcc ggtgcctccc cgrtggraag gggctaaccg cagccctgag gtgggacatt 
421 tgattttaaa aatcagtcat cttggggcgc ttaggttaga ggaatggtag gcagatgccg 
431 tcactccttg cccctcccct cctccttccc acctggaggg gaaatgaaat ctgacaggta 
541 gaaagagggg agttggggtt ctttttctct ctccctccac cagcatcact ctctgcctct 
601 ccctcaaaaa tacgtccctg ggtcaggata tatgttgact ccctagagag ccctggagtc 
661 aacctcccgg ccttcctcca ccctcactct tggccttttc ctgcccccat ttcctcracc 
721 tgtggggcat ggagccacga gcctttgtgt gacggtttgc tttctctctc ctgtctttag 
781 gtgcatggcc gcctcctaat cccatagtcc agaggaggca tccctaggac tgcgggcaag 
841 ggagccgcaa gcccagggca gccttgaacc gtcccctggc ctgccctccg gtgggggcca 
901 ggatgctgaa gaagcagtct gcagggcttg tgctgtgggg cgccatcctc tttgtggcct 
961 ggaatgccct gctgctcctc ttcttctgga cgcgcccagc acctggcagg ccaccctcag 
1021 tcagcgctct cgatggcgac cccgccagcc tcacccggga agtgattcgc ctggcccaag 
1081 acgccgaggt ggagctggag cgcaggcgtg ggctgctgca gcagatcggg gatgccctgt 
1141 cgagccagcg ggggagggtg cccaccgcgg cccctcccgc ccagccgcgt gtgcctgtga 
1201 cccccgcgcc ggcggtgatt cccatcctgg tcatcgcctg tgaccgcagc actgttcggc 
1261 gctgcctgga caagctgctg cattatcggc cctcggctga gctcttcccc atcatcgtta 
1321 gccaggaccg cgggcacgag gagacggccc aggccatcgc ctcctacggc agcgcggtca 
1381 cgcacacccg gcagcccgac ctgagcagca tcgcggtgcc gccggaccac cgcaagttcc 
1441 agggctacta caagatcgcg cgccactacc gctgggcgct gggccaggtc ttccggcagt 
1501 ttcgcttccc cgcggccgtg gtggtggagg atgacctgga ggtggccccg gacttcttcg 
1561 agtactttcg ggccacctat ccgctgctga aggccgaccc ctccctgtgg tgcgtctcgg 
1621 cctggaatga caacggcaag gagcagatgg tggacgccag caggcctgag ctgctctacc 
1681 gcaccgactt tttccctggc ctgggctggc tgctgttggc cgagctctgg gctgagctgg 
1741 agcccaagtg gccaaaggcc ttctgggacg actggatgcg gcggccggag cagcggcagg 
1801 ggcgggcctg catacgccct gagatctcaa gaacgatgac ctttggccgc aagggtgtga 
1861 cgcacgggca gttctttgac cagcacctca agtttatcaa gctgaaccag cagtttgtgc 
1921 acttcaccca gctggacctg tcttacctgc agcgggaggc ctatgaccga gatttcctcg 
1981 cccgcgtcta cggtgctccc cagctgcagg tggagaaagt gaggaccaat gaccggaagg 
2041 agctggggga ggtgcgggtg cagtatacgg ggagggacag cttcaaggct ttcgccaagg 
2101 ctctgggtgt tatggatgac cttaagtcgg gggttccgag agctggctac cggggcattg 
2161 tcaccttcca gttccggggc cgccgtgtcc acctggcgcc cccaccgacg cgggagggct 
2221 atgatcctag ctggaattag cacctgcctg tccttcctgg gccccttctt gccacatcaf 
2281 gagctgaggt gaccacagtc cccaggctgc atcggcctgc ctgtgtttcc ctcttaggtg 
2341 catttatctt tttgattttt ccgagcggca tttaagtgca caaacgataa caagaggatt 
2401 attctcccgt tctcaaggga gtcagatcag gggaactatt ctagggtatg ttgcggggta 
2461 ttaagcagga aaacactgtg cggtgggggg cactgggctt gttggggcca caaatgtcca 
2521 cgtcctgagc tttctcctgg agcatgtgca gagagtttgg caacgttcgc tctcttgacc 
2581 agaccccttc tccctgactg gctcttccag ccaggcacga gccccccctc tatacctgcc 
2641 ccccttccca gtggggactg agttatggga gaaggggaca tatttgtggc caaaatgata 
2701 ctaaccaaag gggcttcctt gtcagggcct ggCggagttg gtgggtcatc ggggctcacc 
2761 gcctcctgcc cttctctcct gtctgacccc cacttagccc ttctctcctc gcagcctagc 
2321 agtttatagt tctgagatgg aaagttgaag ggggcaagca agacctctcc tcagcccatg 
2881 cccagctgtc aggagagagg tgcagggagg aaggccttgt gctgggacaa cctctctctt 
2941 gccttacctt cagagaggac tatgccctga cccctccttt ctgaaaatca gtgccctccc 
3001 tgttgctcta ggaggctcct gctggcttgg tagaagacag aattcgatct gcctgtccct 
3061 ttttcccctg gggtttgaca cacaggctcc tctcagcatg aggtggagca gcgaccaggt 
3121 ggagcagtga ccaggacgcc tctggcccag tgctgcccag cccccccgcc cgctcccagg 
3181 cgccccatgt cctcacaggc caggacgcca tggcggccgg gagcatgcga 
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START 

880: c c tgc cct ccg gtg ggg gcc aggjatg ctg aag aag cag tct gca 
3 : . , CYS PRO PRO VAL GLY ALA ARGiMET LEU LYS LYS GLN SER ALA 

924; ggg ctt gtg ctg tgg ggc get ate etc ttt gtg gcc tgg aat gcc 
3: GLY LEU VAL LEU TRP GLY ALA ILE LEU PHE VAL ALA TRP ASN ALA 

969: ctg ctg etc etc ttc ttc tgg acg cgc cca gca cct ggc agg cca 
3: LEU LEU LEU LEU PHE PHE TRP THR ARG PRO ALA PRO GLY ARG PRO 

1014 : ccc tea gtc age get etc gat ggc gac ccc gcc age etc ace egg 
3: PRO SER VAL SER ALA LEU ASP GLY ASP PRO ALA SER LEU THR ARG 

1059: gaa gtg att cgc ctg gcc caa gac gcc gag gtg gag ctg gag cgc 
3: GLU VAL ILE ARG LEU ALA GLN ASP ALA GLU VAL GLU LEU GLU ARG 

1104: agg cgt ggg ctg ctg cag cag ate ggg gat gcc ctg teg age cag 
3: ARG ARG GLY LEU LEU GLN GLN ILE GLY ASP ALA LEU SER SER GLN 

1149 : egg ggg agg gtg ccc acc gcg gcc cct ccc gcc cag ccg cgt gtg 
3: ARG GLY ARG VAL PRO THR ALA ALA PRO PRO ALA GLN PRO ARG VAL 

1194: cct gtg acc ccc gcg ccg gcg gtg att ccc ate ctg gtc ate gcc 
3: PRO VAL THR PRO ALA PRO ALA VAL ILE PRO ILE LEU VAL ILE ALA 

1239: tgt gac cgc age act gtt egg cgc tgc ctg gac aag ctg ctg cat 
3: CYS ASP ARG SER THR VAL ARG ARG CYS LEU ASP LYS LEU LEU HIS 

1284: tat egg ccc teg get gag etc ttc ccc arc ate gtt age cag gac 
3: TYR ARG PRO SER ALA GLU LEU PHE PRO ILE ILE VAL SER GLN ASP 

1329; tgc ggg cac gag gag acg gee cag gcc ate gcc tec tac ggc age 
3 : CYS GLY HIS GLU GLU THR ALA GLN ALA ILE ALA SER TYR GLY SER 

1374: gcg gtc acg cac ate egg cag ccc gac ctg age age att gcg gtg 
3: ALA VAL THR HIS ILE ARG GLN PRO ASP LEU SER SER ILE ALA VAL 

1419 : ccg ccg gac cac cgc aag ttc cag ggc tac tac aag ate gcg cgc 
3: PRO PRO ASP HIS ARG LYS PHE GLN GLY TYR TYR LYS ILE ALA ARG 

1464: cac tac cgc tgg gcg ctg ggc cag gtc ttc egg cag ttt cgc ttc 
3: HIS TYR ARG TRP ALA LEU GLY GLN VAL PHE ARG GLN PHE ARG PHE 

1509: ccc gcg gee gtg gtg gtg gag gat gac ctg gag gtg gee ccg gac 
3 : PRO ALA ALA VAL VAL VAL GLU ASP ASP LEU GLU VAL ALA PRO ASP 

1554: ttc ttc gag tac ttt egg gcc acc tat ccg ctg ctg aag gcc gac 
3: PHE PHE GLU TYR PHE ARG ALA THR TYR PRO LEU LEU LYS ALA ASP 

1599: ccc tec ctg tgg tgc gtc teg gcc tgg aat gac aac ggc aag gag 
3: PRO SER LEU TRP CYS VAL SER ALA TRP ASN ASP ASN GLY LYS GLU 

1644: cag atg gtg gac gcc age agg cct gag ctg etc tac cgc acc gac 
3: GLN MET VAL ASP ALA SER ARG PRO GLU LEU LEU TYR ARG THR ASP 
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1689: ttt ttc cct ggc ctg ggc tgg ctg ctg ttg gcc gag etc tgg get 
3: PHE PHE PRO GLY LEU GLY TRP LEU LEU LEU ALA GLU LEU TRP ALA 

1734: gag ctg gag ccc aag tgg cca aag gcc ttc tgg gac gac tgg atg 
3: GLU LEU GLU PRO LYS TRP PRO LYS ALA PHE TRP ASP ASP TRP MET 

1779: egg egg ccg gag cag egg cag ggg egg gcc tgc ata cgc cct gag 
3 : ARG ARG PRO GLU GLN ARG GLN GLY ARG ALA CYS ILE ARG PRO GLU 

18 24: ate tea aga acg atg acc ttt ggc cgc aag ggt gtg acg cac ggg 
3 : ILE SER ARG THR MET THR PHE GLY ARG LYS GLY VAL THR HIS GLY 

cag ttc ttt gac cag cac etc aag ttt ate aag ctg aac cag cag 
3 : GLN PHE PHE ASP GLN HIS LEU LYS PHE ILE LYS LEU ASN GLN GLN 

gtg cac ttc acc cag ctg gac ctg tct tac ctg cag egg gag 
3 : PHE VAL HIS PHE THR GLN LEU ASP LEU SER TYR LEU GLN ARG GLU 



1869 



1914: 



1959; 



gcc tat gac cga gat ttc etc gcc cgc gte tac ggt get ccc cag 
3 : ALA TYR ASP ARG ASP PHE LEU ALA ARG VAL TYR GLY ALA PRO GLN 



2004: 



ctg cag gtg gag aaa gtg agg ace aat gac egg aag gag ctg ggg 
3 : LEU GLN VAL GLU LYS VAL ARG THR ASN ASP ARG LYS GLU LEU GLY 

2049: gag gtg egg gtg cag tat acg ggg agg gac age ttc aag get ttc 
3 : GLU VAL ARG VAL GLN TYR THR GLY ARG ASP SER PHE LYS ALA PHE 



2094; 



gee aag get ctg ggt gtt atg gat gac ctt aag teg ggg gtt ccg 
3: ALA LYS ALA LEU GLY VAL MET ASP ASP LEU LYS SER GLY VAL PRO 



2139: aga get ggc tac egg ggt att gte acc ttc cag ttc egg ggc cgc 

3 : ARG ALA GLY TYR ARG GLY ILE VAL THR PHE GLN PHE ARG GLY ARG 

2184: egt gte cac ctg gcg ccc cca ccg acg tgg gag ggc tat gat cct 

3 : ARG VAL HIS LEU ALA PRO PRO PRO THR TRP GLU GLY TYR ASP PRO 
STOP 

2229: age tgg aat j tag cac ctg cct g 

3: SER TRP ASNi*** HIS LEU PRO . 



FIGURE 8 (continued) 
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